Copy link to clipboard
Copied
I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is where my 'expertise' ends...lol. Lets say you have a list block like:
<ul>
<li><a href="details.cfm?id=20>Steve String</a></li>
<li><a href="details.cfm?id=50>Mary String</a></li>
<li><a href="details.cfm?id=120>Jerry String</a></li>
</ul>
I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?
Copy link to clipboard
Copied
Ok, I used <cfset cleanDocs = reReplace( theDocs, "[^[:digit:]]", ' ', "all") /> which infact did remove all the non numeric characters but leaves me with duplicates (this is a weird escape(url) in with the hyperlinks so the ID is in there twice. Anyone know how to remove duplicates from a string?
Copy link to clipboard
Copied
cflib.org has a function called listdistinct that would probably help you out.
Copy link to clipboard
Copied
Thanks Dan. I ended up turning the white space into comma, then turned it into a list which i then looped over and removed the duplicates. Seems to work since I couldn't see the white spaces in between the numbers. Looked like only 1 space per but ended up being varied.
Copy link to clipboard
Copied
wmkolcz wrote:
I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is where my 'expertise' ends...lol. Lets say you have a list block like:
<ul>
<li><a href="details.cfm?id=20>Steve String</a></li>
<li><a href="details.cfm?id=50>Mary String</a></li>
<li><a href="details.cfm?id=120>Jerry String</a></li>
</ul>
I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?
The following assumes ColdFusion 8 or newer (Rematch). I have also assumed that the IDs are 2 or 3 digits long. You can easily adapt the code as appropriate.
<cfsavecontent variable="block"><ul>
<li><a href="details.cfm?id=20>Steve String</a></li>
<li><a href="details.cfm?id=50>Mary String</a></li>
<li><a href="details.cfm?id=120>Jerry String</a></li>
</ul>
</cfsavecontent>
<!--- Raw list is of the form =20>, =100>, etc. --->
<cfset rawList = arrayToList(REMatch("(=[0-9]{2,3}>)",block))>
<cfset numberList = replaceList(rawList,"=,>",",")>
<cfoutput>#numberList#</cfoutput>
Get ready! An upgraded Adobe Community experience is coming in January.
Learn more