• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Using chhttp to get data off a page.

Contributor ,
Mar 01, 2022 Mar 01, 2022

Copy link to clipboard

Copied

Hello,

I'd like to pull specific data off a page using cfhttp if it's possible. I start by getting the entire page.

<cfhttp url="https://somesite.com/stream/" result="test">
<cfdump var="#test#">

In a perfect world, I'd like to extract just the first occurrence of the title and artist from the results into two variables.

<li><span class="song-time">1:49 PM</span>&nbsp;<span class="song-title">Wanted Dead Or Alive</span>—<span class="song-artist">Bon Jovi</span></li>

Is there a way to capture everything between the two span classes into two variables? I don't need the time span.

 

Thanks!

Gary

Views

353

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 01, 2022 Mar 01, 2022

Copy link to clipboard

Copied

Yes, Gary, there are many ways.

 

One (which is rather tedious, but totally doable) is to simply treat each line as a string, and use cfml string functions to find and extract what you want, optionally using regular expressions. The cfml reference manual has a section on string functions:

https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-functions/functions-by-category/string-...

 

A far more powerful way is to treat each line as the html that it is, and use the free jsoup Java library to easily pull out the values within those spans. There have been over the years various resources showing how to use jsoup within cfml, such as:

 

While some of those may be quite dated, the concepts and approach remain the same, as does even some of the code.

 

And sometimes they may show the html being obtained via a cfhttp like you're doing, or via a jsoup method call. Either way, the processing of the result is what matters and is shown being done in cfml--which may look a lot like Java but is really just cfml calling on Java methods and properties of the jsoup library.  (In cf2021, you can even just run Java within cfml.) 

 

With those examples, I think you could readily get what you want. Someone else may chime in with the EXACT cfml to call the SPECIFIC methods to get the results you need for you PARTICULAR example. I've opted here to go for the "teach a man to fish" approach, mostly because I'm writing this on a phone, but also as much because someone else may find this and I wanted to share the resources and approach. Hopes it helps. 🙂 

 


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 01, 2022 Mar 01, 2022

Copy link to clipboard

Copied

Thanks Charlie. I'm going to play with jsoup. Looks like a fun approach.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 01, 2022 Mar 01, 2022

Copy link to clipboard

Copied

Great to hear. Would you feel it right to mark my first reply as the answer, or might you be preferring to wait and see if anyone has still other options for you to consider?

 

I only ask now as it's easy to lose track of such things and leave the post not marked as having any answer, which some folks pay attention to. 🙂 


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 01, 2022 Mar 01, 2022

Copy link to clipboard

Copied

I'm going to try it tomorrow and I'll report back how it goes and mark the answer then.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 09, 2022 Mar 09, 2022

Copy link to clipboard

Copied

Charlie,

I downloaded jsoup and tried the sample code from the example it and just runs until it eventually times out.

<!--- cache it to speed it up --->
<cfif not cacheIdExists("cnnhtml")>
	<cfhttp url="http://www.cnn.com">
	<cfset cnnhtml = cfhttp.filecontent>
	<cfset cachePut("cnnhtml",cnnhtml)>
<cfelse>
	<cfset cnnhtml = cacheGet("cnnhtml")>
</cfif>

<cfscript>
jsoup = createObject("java", "org.jsoup.Jsoup");
doc = jsoup.parse(cnnhtml);
links = doc.select("img"); 
</cfscript>

<cfdump var="#variables#" abort />

<cfloop index="e" array="#links#">
	<cfoutput>
		#e.attr("src")# --- Title: #e.attr("title")# --- Alt: #e.attr("alt")#<br/>
	</cfoutput>
</cfloop>

Did I miss something?


Gary 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 02, 2022 Mar 02, 2022

Copy link to clipboard

Copied

Is there a way to capture everything between the two span classes into two variables? I don't need the time span.

 


By @ghanna1

 

Sure.

I would do it simply, using XML. It would go like in this fully worked out example:

 

<cfsavecontent variable="songsList">
	<li><span class="song-time">1:49 PM</span>&nbsp;<span class="song-title">Wanted Dead Or Alive</span>—<span class="song-artist">Bon Jovi</span></li>
    <li><span class="song-time">2:48 PM</span>&nbsp;<span class="song-title">Yesterday</span>—<span class="song-artist">Beatles</span></li>
    <li><span class="song-time">3:47 PM</span>&nbsp;<span class="song-title">Dr. Beat</span>—<span class="song-artist">Miami Sound Machine</span></li>
    <li><span class="song-time">4:46 PM</span>&nbsp;<span class="song-title">Down Under</span>—<span class="song-artist">Men at Work</span></li>
</cfsavecontent>

<!--- Replace the characters &nbsp; and — occurring between </span> and <span --->
<cfset songsList=REreplaceNoCase(songsList, "</span>&nbsp;<span|</span>—<span","</span><span","all")>

<!--- Create an XML from the list --->
<cfxml variable="songsXML" >
	<songs>
		<cfoutput>#songsList#</cfoutput>
	</songs>
</cfxml>

<!--- Loop through the li elements, picking out the respective span xmlTexts --->
<cfloop from="1" to="#structcount(songsxml.songs)#" index="songNumber">
	<p>
	<cfoutput>
    	Song Number: #songNumber# <br>
		Song Title: #songsxml.songs.li[songNumber].span[2].xmlText# <br>
		Song Artist: #songsxml.songs.li[songNumber].span[3].xmlText# <br>
    </cfoutput>
	</p>
</cfloop>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 09, 2022 Mar 09, 2022

Copy link to clipboard

Copied

Just saw this response. Thanks! I will try it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2022 Mar 09, 2022

Copy link to clipboard

Copied

LATEST
 

Just saw this response. Thanks! I will try it.


By @ghanna1

 

You only have to save the code as a CFM file, then run it. In other words, just put it in yer pipe and smoke. 🙂

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation