Just getting numbers from a string

Report · Mar 27, 2012

I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is where my 'expertise' ends...lol. Lets say you have a list block like:

<ul>

<li><a href="details.cfm?id=20>Steve String</a></li>

<li><a href="details.cfm?id=50>Mary String</a></li>

<li><a href="details.cfm?id=120>Jerry String</a></li>

</ul>

I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?

Report · Mar 27, 2012

Ok, I used <cfset cleanDocs = reReplace( theDocs, "[^[:digit:]]", ' ', "all") /> which infact did remove all the non numeric characters but leaves me with duplicates (this is a weird escape(url) in with the hyperlinks so the ID is in there twice. Anyone know how to remove duplicates from a string?

Report · Mar 27, 2012

cflib.org has a function called listdistinct that would probably help you out.

Report · Mar 27, 2012

Thanks Dan. I ended up turning the white space into comma, then turned it into a list which i then looped over and removed the duplicates. Seems to work since I couldn't see the white spaces in between the numbers. Looked like only 1 space per but ended up being varied.

Report · Mar 28, 2012

wmkolcz wrote:
I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is where my 'expertise' ends...lol. Lets say you have a list block like:
<ul>
<li><a href="details.cfm?id=20>Steve String</a></li>
<li><a href="details.cfm?id=50>Mary String</a></li>
<li><a href="details.cfm?id=120>Jerry String</a></li>
</ul>
I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?

The following assumes ColdFusion 8 or newer (Rematch). I have also assumed that the IDs are 2 or 3 digits long. You can easily adapt the code as appropriate.

<li><a href="details.cfm?id=20>Steve String</a></li>

<li><a href="details.cfm?id=50>Mary String</a></li>

<li><a href="details.cfm?id=120>Jerry String</a></li>

</ul>

</cfsavecontent>

<cfset rawList = arrayToList(REMatch("(=[0-9]{2,3}>)",block))>

<cfoutput>#numberList#</cfoutput>