Skip to main content
WolfShade
Legend
October 16, 2018
Answered

Help with getting href values of a list of <a> tags returned by ColdFusion via AJaX

  • October 16, 2018
  • 2 replies
  • 1367 views

Hello, all,

I'm working on a project whereby we can run a script that will check all the links of a generated page to make sure that each link (local and off-site) are valid.

As of now, and I can be open to change, I'm using AJaX to call a CFC function that uses CFDIRECTORY to recursively get all .cfm and .htm pages of a site, swap out the physical drive path for each entry with its FQDN address, loops through that using CFHTTP to parse the page, then uses REMatch to get an array of all anchor tags in the page.  I am currently looping through the arrays and adding the values as KEY to a struct to eliminate duplicates, then CFRETURN the ArrayToList(StructKeyArray(array),"|").  CF is returning something like:

<a href="https://www.domain.com/index.cfm">Link One</a>|<a id="newlink" href="https://www.google.com">Link Two</a>|<a alt="test" id="testing" href="https://www.another.com/index.asp">Link Three</a>

So how can I get just the value of the href attributes??

V/r,

^ _ ^

    This topic has been closed for replies.
    Correct answer WolfShade

    Aaaaaaaaaaaand.. now I learn that there was an easier way to do this.  Sigh.

    https://www.bennadel.com/blog/1921-remultimatch---extracting-iterative-regular-expression-patterns-in-coldfusion.htm

    V/r,

    ^ _ ^

    2 replies

    Inspiring
    October 19, 2018

    I don't have the code at hand at this moment but there's a library called JSOUP. You feed it an HTML String (document, whatever) and can query tags like a DOM. No need to care for RegExp at all.

    WolfShade
    WolfShadeAuthor
    Legend
    October 19, 2018

    Bardnet  wrote

    No need to care for RegExp at all.

    But I _love_ RegEx.  I know, there are a lot of people who are all like "RegEx doesn't work well with HTML because it's.. " whatever.  IDC.  I use RegEx whenever possible.  And I don't even fully understand it, but I still love it and use it whenever I can.  I love REMatch(), and REreplaceNoCase(), and REFind().  I wish Adobe would take Ben Nadel's idea and make an actual REMultiMatch() native to the server.  (Wish in one hand, spit in the other.. y'know.)

    V/r,

    ^ _ ^

    WolfShade
    WolfShadeAuthor
    Legend
    October 16, 2018

    Nevermind.. I managed to get CF to sort out the attribute values and send that as a list.

    After inserting the anchor tags into an associative struct with the tags as the key to remove duplicates, I grabbed the StructKeyArray() of that and looped over it using REFindNoCase(), grabbed the position and length of the href attribute, and used MID() to get only the URL.

    <cfset ska = StructKeyArray(variables.remDupes) />

    <cfloop index="itm" from="1" to="#ArrayLen(variables.ska)#">

        <cfset idx = REFindNoCase("href\s*=\s*['""][^'""]+", variables.ska[itm],'1', 'true') />

        <cfset variables.ska[itm] = mid(variables.ska[itm],val(idx.pos[1]+6),val(idx.len[1]-6) />

    </cfloop>

    <cfreturn ArrayToList(variables.ska,"|") />

    HTH,

    ^ _ ^

    WolfShade
    WolfShadeAuthor
    Legend
    October 16, 2018

    Although, I just realized that if there are any spaces like:

    href = "https://www.domain.com" that this won't work, so I have to tweak this, a bit.

    V/r,

    ^ _ ^

    WolfShade
    WolfShadeAuthorCorrect answer
    Legend
    October 16, 2018

    Aaaaaaaaaaaand.. now I learn that there was an easier way to do this.  Sigh.

    https://www.bennadel.com/blog/1921-remultimatch---extracting-iterative-regular-expression-patterns-in-coldfusion.htm

    V/r,

    ^ _ ^