Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • EspaƱol
      • FranƧais
      • PortuguĆŖs
  • ę—„ęœ¬čŖžć‚³ćƒŸćƒ„ćƒ‹ćƒ†ć‚£
  • ķ•œźµ­ ģ»¤ė®¤ė‹ˆķ‹°
0

Just getting numbers from a string

Explorer ,
Mar 27, 2012 Mar 27, 2012

I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is  where my 'expertise' ends...lol. Lets say you have a list block like:

<ul>

<li><a href="details.cfm?id=20>Steve String</a></li>

<li><a href="details.cfm?id=50>Mary String</a></li>

<li><a href="details.cfm?id=120>Jerry String</a></li>

</ul>

I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?

3.3K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Mar 27, 2012 Mar 27, 2012

Ok, I used  <cfset cleanDocs = reReplace( theDocs, "[^[:digit:]]", ' ', "all") /> which infact did remove all the non numeric characters but leaves me with duplicates (this is a weird escape(url) in with the hyperlinks so the ID is in there twice. Anyone know how to remove duplicates from a string?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 27, 2012 Mar 27, 2012

cflib.org has a function called listdistinct that would probably help you out.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Mar 27, 2012 Mar 27, 2012

Thanks Dan. I ended up turning the white space into comma, then turned it into a list which i then looped over and removed the duplicates. Seems to work since I couldn't see the white spaces in between the numbers. Looked like only 1 space per but ended up being varied.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 28, 2012 Mar 28, 2012
LATEST

wmkolcz wrote:

I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is  where my 'expertise' ends...lol. Lets say you have a list block like:

<ul>

<li><a href="details.cfm?id=20>Steve String</a></li>

<li><a href="details.cfm?id=50>Mary String</a></li>

<li><a href="details.cfm?id=120>Jerry String</a></li>

</ul>

I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?

The following assumes ColdFusion 8 or newer (Rematch). I have also assumed that the IDs are 2 or 3 digits long. You can easily adapt the code as appropriate.

<cfsavecontent variable="block"><ul>

<li><a href="details.cfm?id=20>Steve String</a></li>

<li><a href="details.cfm?id=50>Mary String</a></li>

<li><a href="details.cfm?id=120>Jerry String</a></li>

</ul>

</cfsavecontent>

<!--- Raw list is of the form =20>, =100>, etc.  --->

<cfset rawList = arrayToList(REMatch("(=[0-9]{2,3}>)",block))>

<cfset numberList = replaceList(rawList,"=,>",",")>

<cfoutput>#numberList#</cfoutput>

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources