Skip to main content
Participating Frequently
March 23, 2011
Question

regexp not working

  • March 23, 2011
  • 1 reply
  • 599 views

trying to strip strings out of a http request using a regexp:


I'm querying a server as to the presence of PDF files in a directory, then parsing a list from the rawhtml, unfortunately, some of the files contain periods and commas, and the Regexp only matches a string from after the punctuation character,

e.g.: "John M. Smith.pdf" gets matched as "Smith.pdf"


Here's the Regexp:

<cffunction name="parsePDF" access="public" returntype="string">
        <cfargument name="RAWHTML" type="string" required="yes">
        <cfset REGEXPMATCH="^[\s\S][[:punct:]]*\.pdf$">
        <cfreturn REMatchNoCase(REGEXPMATCH,RAWHTML)>
   </cffunction>

TIA

Server ProductColdFusion
Version8,0,1,195765 
EditionStandard 

IIS 6

This topic has been closed for replies.

1 reply

Inspiring
March 23, 2011

Your regex seems unnecessarily complex (esp given it ain't doing what you need it to ;-)

What would be wrong with just:

^.*\.pdf$

?

--

Adam

ManweevilAuthor
Participating Frequently
March 23, 2011

Still doesn't work. It seems to be a problem with the periods being interpreted as CR-LF instead of literals

Inspiring
March 24, 2011

Yeah, sorry, I didn't read your requirement properly.

It's easy enough to tell where the .pdf file name ends in your mark-up (it'll end with ".pdf ;-), but how do you tell where the file name starts?  Given a file name can contain most characters (just slash and null are prohibited by NTFS... Windows is slightly more picky, but still), it kinda means you need to have something you identify as a boundary...?

--

Adam