Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Regex: Strip out everything BUT <a href=""> links

Participant ,
Aug 19, 2010 Aug 19, 2010

How would I strip out all the code and characters except for links using regex?

<font face="Arial" size="2"><a href="/Latitude-Longitude-545478-Louisiana-Achiquot_Bay.html">Achiquot Bay</a></font>

<font face="Arial" size="2">Bay

<font face="Arial" size="2">Avoyelles

<font face="Arial" size="2">LA


<font color="CCDFCA"></font>


<font face="Arial" size="2"><a href="/Latitude-Longitude-1629555-Louisiana-Adolph_Clarks_Pond__historical_.html">Adolph Clarks Pond (historical)</a></font>

<font face="Arial" size="2">Bay

<font face="Arial" size="2">Plaquemines

<font face="Arial" size="2">LA


<font color="CCDFCA"></font>

762
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Aug 20, 2010 Aug 20, 2010

If you want to avoid writing your on regex, and use one of the HTML tag stripping functions at CFLIB.ORG instead, then just do 2 replaceNoCase() calls first:  One to change "<a href" to something like "$<a href" and another to turn "</a>" into "$/a>.  Then run the function, and then do the inverse replaceNoCase() functions to get the tag syntax back.  You can in fact do the tag stripping of the non-href tags in a regex but it's going to get messy.

-reed

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Aug 20, 2010 Aug 20, 2010

Does this work:

<cfset variables.m=REMatchNoCase("<a\s.*?<\/a>",variables.html) />
<cfdump var="#variables.m#" />

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 21, 2010 Aug 21, 2010
LATEST

I wouldn't normally just dish out an answer on these forums, but that seemed like an interesting regex challenge, so I gave it a bash, and came up with this:

(?!</?\ba\b[^>]*?>)</?[a-z]+?[^>]*?>

I've given it superficial testing, and it seems to do the trick.

--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources