Skip to main content
wmkolcz
Inspiring
May 31, 2011
Question

cfhttp screen scrape. How to get information 'between'

  • May 31, 2011
  • 2 replies
  • 4346 views

I am scraping one of our sites for information to display on another site. I can trigger the page to display the correct information but the information returned need to be parsed for only some of the text. I see that I can narrow the infromation down to a span tag and a hortizonal rule. Is there a way to grab just the information between '<span class="sml">' and '<hr />' and render it to the screen?

Sample cfhttp.fileContent:

<span class="sml">Administration &gt; Business Administration &gt; Managerial</span><br /> 100116 - <strong>Academic And/Or Research Program Officer Intermediate</strong> - AD220 - Independently manages a large academic or research program. Designs and develops major program components, develops and maintains curricula, develops research, leads professional conferences and provides public relations support. Develops ideas and options for faculty review and decision, and develops and implements instruction and research programs that reflect faculty interests. Evaluates effectiveness of curriculum and effectiveness of program in meeting goals. May teach seminars and workshops and participate with faculty on research. Plans, directs and controls program budget. Supervises program staff. Education and Experience: Academic background and experience in selected subject area. Requires advanced degree, preferably Ph.D. in selected subject area. Requires several years experience in academic work related to particular area of research. The primary duty of employees in this classification is the management of a customarily recognized department or subdivision, including the supervision of three or more full-time equivalent employees every week. Direction is over a permanent status-continuing function, not a collection of employees assigned to complete a project. Management duties include interviewing, selecting and training of employees; setting and adjusting their rates of pay and hours of work; planning and directing their work; appraising their productivity and efficiency for the purpose of recommending promotions or other changes in their status; handling their complaints and grievances and disciplining them when necessary. Management responsibilities include the authority to hire, fire, or promote assigned employees or make recommendations that are given particular weight. Employees have impact on budgeting, controlling costs, planning, scheduling, and procedural change. Under FLSA, incumbents in this position meet the criteria for exempt status.<hr />

Thanks!

This topic has been closed for replies.

2 replies

June 2, 2011

@wmkolcz

Something simple like the following "could" work:

<cfset string = cfhttp.filecontent />
<cfset StartText = '<span class="sml">' />
<cfset Start = FindNoCase(StartText, string, 1) />
<cfset EndText='<hr />' />
<cfset Length=Len(StartText) />
<cfset End = FindNoCase(EndText, string, Start) />
<cfset parse = Mid(string, Start+Length, End-Start-Length) />

<cfset parse = trim(parse) />
<cfoutput>#parse#</cfoutput>

Good Luck!

<cfwild />

Inspiring
October 13, 2012

Thanks for this example, just what i needed!

Inspiring
May 31, 2011

Yep, you can do a regex find to get the start position and length of the match, and then extract that from the string.

Have a read up on reFind():

http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7e9a.html

And there's a link from there through to CF's regex support:

http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSc3ff6d0ea77859461172e0811cbec0a38f-7fff.html

Give that a blast...

--

Adam

wmkolcz
wmkolczAuthor
Inspiring
May 31, 2011

Unfortunately I am stuck with Blue Disaster/Dragon 7. Will that work with that too? Most of the answeres I found online call for one of the two (8 or 9).

wmkolcz
wmkolczAuthor
Inspiring
May 31, 2011

opps