Highlighted

Help with getting href values of a list of <a> tags returned by ColdFusion via AJaX

LEGEND ,
Oct 16, 2018

Copy link to clipboard

Copied

Hello, all,

I'm working on a project whereby we can run a script that will check all the links of a generated page to make sure that each link (local and off-site) are valid.

As of now, and I can be open to change, I'm using AJaX to call a CFC function that uses CFDIRECTORY to recursively get all .cfm and .htm pages of a site, swap out the physical drive path for each entry with its FQDN address, loops through that using CFHTTP to parse the page, then uses REMatch to get an array of all anchor tags in the page.  I am currently looping through the arrays and adding the values as KEY to a struct to eliminate duplicates, then CFRETURN the ArrayToList(StructKeyArray(array),"|").  CF is returning something like:

<a href="https://www.domain.com/index.cfm">Link One</a>|<a id="newlink" href="https://www.google.com">Link Two</a>|<a alt="test" id="testing" href="https://www.another.com/index.asp">Link Three</a>

So how can I get just the value of the href attributes??

V/r,

^ _ ^

Views

564

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Help with getting href values of a list of <a> tags returned by ColdFusion via AJaX

LEGEND ,
Oct 16, 2018

Copy link to clipboard

Copied

Hello, all,

I'm working on a project whereby we can run a script that will check all the links of a generated page to make sure that each link (local and off-site) are valid.

As of now, and I can be open to change, I'm using AJaX to call a CFC function that uses CFDIRECTORY to recursively get all .cfm and .htm pages of a site, swap out the physical drive path for each entry with its FQDN address, loops through that using CFHTTP to parse the page, then uses REMatch to get an array of all anchor tags in the page.  I am currently looping through the arrays and adding the values as KEY to a struct to eliminate duplicates, then CFRETURN the ArrayToList(StructKeyArray(array),"|").  CF is returning something like:

<a href="https://www.domain.com/index.cfm">Link One</a>|<a id="newlink" href="https://www.google.com">Link Two</a>|<a alt="test" id="testing" href="https://www.another.com/index.asp">Link Three</a>

So how can I get just the value of the href attributes??

V/r,

^ _ ^

Views

565

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Oct 16, 2018 0
LEGEND ,
Oct 16, 2018

Copy link to clipboard

Copied

Nevermind.. I managed to get CF to sort out the attribute values and send that as a list.

After inserting the anchor tags into an associative struct with the tags as the key to remove duplicates, I grabbed the StructKeyArray() of that and looped over it using REFindNoCase(), grabbed the position and length of the href attribute, and used MID() to get only the URL.

<cfset ska = StructKeyArray(variables.remDupes) />

<cfloop index="itm" from="1" to="#ArrayLen(variables.ska)#">

    <cfset idx = REFindNoCase("href\s*=\s*['""][^'""]+", variables.ska[itm],'1', 'true') />

    <cfset variables.ska[itm] = mid(variables.ska[itm],val(idx.pos[1]+6),val(idx.len[1]-6) />

</cfloop>

<cfreturn ArrayToList(variables.ska,"|") />

HTH,

^ _ ^

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 16, 2018 0
LEGEND ,
Oct 16, 2018

Copy link to clipboard

Copied

Although, I just realized that if there are any spaces like:

href = "https://www.domain.com" that this won't work, so I have to tweak this, a bit.

V/r,

^ _ ^

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 16, 2018 0
LEGEND ,
Oct 16, 2018

Copy link to clipboard

Copied

Aaaaaaaaaaaand.. now I learn that there was an easier way to do this.  Sigh.

https://www.bennadel.com/blog/1921-remultimatch---extracting-iterative-regular-expression-patterns-i...

V/r,

^ _ ^

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 16, 2018 0
LEGEND ,
Oct 18, 2018

Copy link to clipboard

Copied

If anyone ever tries something like what I'm doing, save yourself the trouble and just use Ben Nadel's solution.  I copied/pasted his CFFUNCTION and placed it in my .cfc, and it works WONDERFULLY.  So simple.

HTH,

^ _ ^

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 18, 2018 0
Participant ,
Oct 19, 2018

Copy link to clipboard

Copied

I don't have the code at hand at this moment but there's a library called JSOUP. You feed it an HTML String (document, whatever) and can query tags like a DOM. No need to care for RegExp at all.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 19, 2018 0
LEGEND ,
Oct 19, 2018

Copy link to clipboard

Copied

Bardnet  wrote

No need to care for RegExp at all.

But I _love_ RegEx.  I know, there are a lot of people who are all like "RegEx doesn't work well with HTML because it's.. " whatever.  IDC.  I use RegEx whenever possible.  And I don't even fully understand it, but I still love it and use it whenever I can.  I love REMatch(), and REreplaceNoCase(), and REFind().  I wish Adobe would take Ben Nadel's idea and make an actual REMultiMatch() native to the server.  (Wish in one hand, spit in the other.. y'know.)

V/r,

^ _ ^

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 19, 2018 0
Bardnet LATEST
Participant ,
Oct 19, 2018

Copy link to clipboard

Copied

For completeness' sake

<cfset oJsoup = createObject( "java", "org.jsoup.Jsoup" )>

<cfset oFile = createObject( "java", "java.io.File" ).init( Expandpath( "./file001.html" ) )>

<cfset oDoc = oJsoup.parse( oFile, "UTF8" )>

<cfset arrLinks = oDoc.select( "A" )>

<cfloop array="#arrLinks#" index="oEl">

#oEl.attr( "href" )#

</cfloop>

JSOUP has plenty parse methods, input does not need to be a local file: Jsoup (jsoup Java HTML Parser 1.11.3 API)

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 19, 2018 1