Skip to main content
wmkolcz
Inspiring
March 27, 2012
Question

Just getting numbers from a string

  • March 27, 2012
  • 2 replies
  • 3271 views

I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is  where my 'expertise' ends...lol. Lets say you have a list block like:

<ul>

<li><a href="details.cfm?id=20>Steve String</a></li>

<li><a href="details.cfm?id=50>Mary String</a></li>

<li><a href="details.cfm?id=120>Jerry String</a></li>

</ul>

I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?

    This topic has been closed for replies.

    2 replies

    BKBK
    Community Expert
    Community Expert
    March 28, 2012

    wmkolcz wrote:

    I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is  where my 'expertise' ends...lol. Lets say you have a list block like:

    <ul>

    <li><a href="details.cfm?id=20>Steve String</a></li>

    <li><a href="details.cfm?id=50>Mary String</a></li>

    <li><a href="details.cfm?id=120>Jerry String</a></li>

    </ul>

    I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?

    The following assumes ColdFusion 8 or newer (Rematch). I have also assumed that the IDs are 2 or 3 digits long. You can easily adapt the code as appropriate.

    <cfsavecontent variable="block"><ul>

    <li><a href="details.cfm?id=20>Steve String</a></li>

    <li><a href="details.cfm?id=50>Mary String</a></li>

    <li><a href="details.cfm?id=120>Jerry String</a></li>

    </ul>

    </cfsavecontent>

    <!--- Raw list is of the form =20>, =100>, etc.  --->

    <cfset rawList = arrayToList(REMatch("(=[0-9]{2,3}>)",block))>

    <cfset numberList = replaceList(rawList,"=,>",",")>

    <cfoutput>#numberList#</cfoutput>

    wmkolcz
    wmkolczAuthor
    Inspiring
    March 27, 2012

    Ok, I used  <cfset cleanDocs = reReplace( theDocs, "[^[:digit:]]", ' ', "all") /> which infact did remove all the non numeric characters but leaves me with duplicates (this is a weird escape(url) in with the hyperlinks so the ID is in there twice. Anyone know how to remove duplicates from a string?

    Inspiring
    March 27, 2012

    cflib.org has a function called listdistinct that would probably help you out.

    wmkolcz
    wmkolczAuthor
    Inspiring
    March 27, 2012

    Thanks Dan. I ended up turning the white space into comma, then turned it into a list which i then looped over and removed the duplicates. Seems to work since I couldn't see the white spaces in between the numbers. Looked like only 1 space per but ended up being varied.