Help with getting href values of a list of <a> tags returned by ColdFusion via AJaX

Report · Oct 16, 2018

Hello, all,

I'm working on a project whereby we can run a script that will check all the links of a generated page to make sure that each link (local and off-site) are valid.

As of now, and I can be open to change, I'm using AJaX to call a CFC function that uses CFDIRECTORY to recursively get all .cfm and .htm pages of a site, swap out the physical drive path for each entry with its FQDN address, loops through that using CFHTTP to parse the page, then uses REMatch to get an array of all anchor tags in the page. I am currently looping through the arrays and adding the values as KEY to a struct to eliminate duplicates, then CFRETURN the ArrayToList(StructKeyArray(array),"|"). CF is returning something like:

<a href="https://www.domain.com/index.cfm">Link One</a>|<a id="newlink" href="https://www.google.com">Link Two</a>|<a alt="test" id="testing" href="https://www.another.com/index.asp">Link Three</a>

So how can I get just the value of the href attributes??

V/r,

^ _ ^

Report · Oct 16, 2018

Nevermind.. I managed to get CF to sort out the attribute values and send that as a list.

After inserting the anchor tags into an associative struct with the tags as the key to remove duplicates, I grabbed the StructKeyArray() of that and looped over it using REFindNoCase(), grabbed the position and length of the href attribute, and used MID() to get only the URL.

<cfset ska = StructKeyArray(variables.remDupes) />
<cfloop index="itm" from="1" to="#ArrayLen(variables.ska)#">
    <cfset idx = REFindNoCase("href\s*=\s*['""][^'""]+", variables.ska[itm],'1', 'true') />
    <cfset variables.ska[itm] = mid(variables.ska[itm],val(idx.pos[1]+6),val(idx.len[1]-6) />
</cfloop>
<cfreturn ArrayToList(variables.ska,"|") />

HTH,

^ _ ^

Report · Oct 16, 2018

Although, I just realized that if there are any spaces like:

href = "https://www.domain.com" that this won't work, so I have to tweak this, a bit.

V/r,

^ _ ^

Report · Oct 16, 2018

Aaaaaaaaaaaand.. now I learn that there was an easier way to do this. Sigh.

https://www.bennadel.com/blog/1921-remultimatch---extracting-iterative-regular-expression-patterns-i...

V/r,

^ _ ^

Report · Oct 18, 2018

If anyone ever tries something like what I'm doing, save yourself the trouble and just use Ben Nadel's solution. I copied/pasted his CFFUNCTION and placed it in my .cfc, and it works WONDERFULLY. So simple.

HTH,

^ _ ^

Report · Oct 19, 2018

I don't have the code at hand at this moment but there's a library called JSOUP. You feed it an HTML String (document, whatever) and can query tags like a DOM. No need to care for RegExp at all.

Report · Oct 19, 2018

Bardnet wrote
No need to care for RegExp at all.

But I _love_ RegEx. I know, there are a lot of people who are all like "RegEx doesn't work well with HTML because it's.. " whatever. IDC. I use RegEx whenever possible. And I don't even fully understand it, but I still love it and use it whenever I can. I love REMatch(), and REreplaceNoCase(), and REFind(). I wish Adobe would take Ben Nadel's idea and make an actual REMultiMatch() native to the server. (Wish in one hand, spit in the other.. y'know.)

V/r,

^ _ ^

Report · Oct 19, 2018

For completeness' sake

#oEl.attr( "href" )#

</cfloop>

JSOUP has plenty parse methods, input does not need to be a local file: Jsoup (jsoup Java HTML Parser 1.11.3 API)

Help with getting href values of a list of <a> tags returned by ColdFusion via AJaX

1 Correct answer

Photos