• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Help with getting href values of a list of <a> tags returned by ColdFusion via AJaX

LEGEND ,
Oct 16, 2018 Oct 16, 2018

Copy link to clipboard

Copied

Hello, all,

I'm working on a project whereby we can run a script that will check all the links of a generated page to make sure that each link (local and off-site) are valid.

As of now, and I can be open to change, I'm using AJaX to call a CFC function that uses CFDIRECTORY to recursively get all .cfm and .htm pages of a site, swap out the physical drive path for each entry with its FQDN address, loops through that using CFHTTP to parse the page, then uses REMatch to get an array of all anchor tags in the page.  I am currently looping through the arrays and adding the values as KEY to a struct to eliminate duplicates, then CFRETURN the ArrayToList(StructKeyArray(array),"|").  CF is returning something like:

<a href="https://www.domain.com/index.cfm">Link One</a>|<a id="newlink" href="https://www.google.com">Link Two</a>|<a alt="test" id="testing" href="https://www.another.com/index.asp">Link Three</a>

So how can I get just the value of the href attributes??

V/r,

^ _ ^

Views

964

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

LEGEND , Oct 16, 2018 Oct 16, 2018

Aaaaaaaaaaaand.. now I learn that there was an easier way to do this.  Sigh.

https://www.bennadel.com/blog/1921-remultimatch---extracting-iterative-regular-expression-patterns-in-coldfusion.htm

V/r,

^ _ ^

Votes

Translate

Translate
LEGEND ,
Oct 16, 2018 Oct 16, 2018

Copy link to clipboard

Copied

Nevermind.. I managed to get CF to sort out the attribute values and send that as a list.

After inserting the anchor tags into an associative struct with the tags as the key to remove duplicates, I grabbed the StructKeyArray() of that and looped over it using REFindNoCase(), grabbed the position and length of the href attribute, and used MID() to get only the URL.

<cfset ska = StructKeyArray(variables.remDupes) />

<cfloop index="itm" from="1" to="#ArrayLen(variables.ska)#">

    <cfset idx = REFindNoCase("href\s*=\s*['""][^'""]+", variables.ska[itm],'1', 'true') />

    <cfset variables.ska[itm] = mid(variables.ska[itm],val(idx.pos[1]+6),val(idx.len[1]-6) />

</cfloop>

<cfreturn ArrayToList(variables.ska,"|") />

HTH,

^ _ ^

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 16, 2018 Oct 16, 2018

Copy link to clipboard

Copied

Although, I just realized that if there are any spaces like:

href = "https://www.domain.com" that this won't work, so I have to tweak this, a bit.

V/r,

^ _ ^

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 16, 2018 Oct 16, 2018

Copy link to clipboard

Copied

Aaaaaaaaaaaand.. now I learn that there was an easier way to do this.  Sigh.

https://www.bennadel.com/blog/1921-remultimatch---extracting-iterative-regular-expression-patterns-i...

V/r,

^ _ ^

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 18, 2018 Oct 18, 2018

Copy link to clipboard

Copied

If anyone ever tries something like what I'm doing, save yourself the trouble and just use Ben Nadel's solution.  I copied/pasted his CFFUNCTION and placed it in my .cfc, and it works WONDERFULLY.  So simple.

HTH,

^ _ ^

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Oct 19, 2018 Oct 19, 2018

Copy link to clipboard

Copied

I don't have the code at hand at this moment but there's a library called JSOUP. You feed it an HTML String (document, whatever) and can query tags like a DOM. No need to care for RegExp at all.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 19, 2018 Oct 19, 2018

Copy link to clipboard

Copied

Bardnet  wrote

No need to care for RegExp at all.

But I _love_ RegEx.  I know, there are a lot of people who are all like "RegEx doesn't work well with HTML because it's.. " whatever.  IDC.  I use RegEx whenever possible.  And I don't even fully understand it, but I still love it and use it whenever I can.  I love REMatch(), and REreplaceNoCase(), and REFind().  I wish Adobe would take Ben Nadel's idea and make an actual REMultiMatch() native to the server.  (Wish in one hand, spit in the other.. y'know.)

V/r,

^ _ ^

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Oct 19, 2018 Oct 19, 2018

Copy link to clipboard

Copied

LATEST

For completeness' sake

<cfset oJsoup = createObject( "java", "org.jsoup.Jsoup" )>

<cfset oFile = createObject( "java", "java.io.File" ).init( Expandpath( "./file001.html" ) )>

<cfset oDoc = oJsoup.parse( oFile, "UTF8" )>

<cfset arrLinks = oDoc.select( "A" )>

<cfloop array="#arrLinks#" index="oEl">

#oEl.attr( "href" )#

</cfloop>

JSOUP has plenty parse methods, input does not need to be a local file: Jsoup (jsoup Java HTML Parser 1.11.3 API)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation