Skip to main content
August 19, 2010
Question

Regex: Strip out everything BUT <a href=""> links

  • August 19, 2010
  • 3 replies
  • 827 views

How would I strip out all the code and characters except for links using regex?

<font face="Arial" size="2"><a href="/Latitude-Longitude-545478-Louisiana-Achiquot_Bay.html">Achiquot Bay</a></font>

<font face="Arial" size="2">Bay

<font face="Arial" size="2">Avoyelles

<font face="Arial" size="2">LA


<font color="CCDFCA"></font>


<font face="Arial" size="2"><a href="/Latitude-Longitude-1629555-Louisiana-Adolph_Clarks_Pond__historical_.html">Adolph Clarks Pond (historical)</a></font>

<font face="Arial" size="2">Bay

<font face="Arial" size="2">Plaquemines

<font face="Arial" size="2">LA


<font color="CCDFCA"></font>

    This topic has been closed for replies.

    3 replies

    Inspiring
    August 21, 2010

    I wouldn't normally just dish out an answer on these forums, but that seemed like an interesting regex challenge, so I gave it a bash, and came up with this:

    (?!</?\ba\b[^>]*?>)</?[a-z]+?[^>]*?>

    I've given it superficial testing, and it seems to do the trick.

    --

    Adam

    Legend
    August 20, 2010

    Does this work:

    <cfset variables.m=REMatchNoCase("<a\s.*?<\/a>",variables.html) />
    <cfdump var="#variables.m#" />

    Inspiring
    August 20, 2010

    If you want to avoid writing your on regex, and use one of the HTML tag stripping functions at CFLIB.ORG instead, then just do 2 replaceNoCase() calls first:  One to change "<a href" to something like "$<a href" and another to turn "</a>" into "$/a>.  Then run the function, and then do the inverse replaceNoCase() functions to get the tag syntax back.  You can in fact do the tag stripping of the non-href tags in a regex but it's going to get messy.

    -reed