Participant

Question

Parsing a text file

Forum|Forum|19 years ago
April 10, 2007
7 replies
992 views

I am trying to parse a long string and get specific values out. The file is actually an email and I need to pull out items from the body to populate a query. The email would look something like

Name: Value, Test Home: (800)555-1212
Address1: 98 Test Center Dr Work: (800)5555-1212

I need to pull the values out to populate firstName, lastName, dayPhone, etc...

I can't get the company to provide this in any other format. I am thinking this should be possible using regular expression to strip all the spaces out, then get the value of firstName by finding Name: and then , and grabbing whats in the middle.

Any ideas on this?

Advanced techniques

This topic has been closed for replies.

T

ThomasWoodAuthor

Participant

PRODUCT TYPE: Homeowners

CONTACT CAN BE REACHED AT:
Name: value, test Home: (555)555-1212
Address1: 123 Test Ln Work: (555)555-1212
Address2:
City/St/Zip: Wilmington, NC, 28412 Email: wesley@mystateinsurance.com
County: New hanover
Contact Time: Anytime Respond Time: Within 24 hours
------------------------------------------------------------------------

This is what it looks like before I run the

<cfset noFormat = ReReplace(emailBody,"[^A-Za-z0-9,:@.]+",'\1','all')>

after this is run the string looks like

PRODUCTTYPE:HomeownersCONTACTCANBEREACHEDAT:Name:value,testHome:5555551212Address1:123TestLnWork:5555551212Address2:City/St/Zip:Wilmington,NC,28412Email:wesley@mystateinsurance.comCounty:Newhanover

Then this is run

<cfset lobType = ReReplace(noFormat,'[A-Za-z0-9,:@.]+PRODUCTTYPE:(.+)CONTACTCANBEREACHEDAT:Name:(.+),(.+)Home:(.+)Address1:(.+)Work:(.+)Address2:[A-Za-z0-9,:]+ityStZip:(.+),(.+),(.+)Email:(.+)County:(.+)Contact','\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12','all')>

and the result is

Homeowners|value|test|5555551212|123TestLn|5555551212|Wilmington|NC|28412|wesley@mystateinsurance.com|Newhanover|...rest of the string...

This works fine assuming all values are present. But if the work number, for instance was blank in the initial string then there is no space in Work:Address2: which ends up not running the entire conversion correctly.

T

ThomasWoodAuthor

Participant

So... I've got the above working, but is their a way to insert a default value if it's null. For example...

I'm pulling Work:(.+)Address1: As \6|

but I have fed some values in where the string ends up being Work:Address1: so when I try to put this into an array it throws an exception.

I tried go back at the end and doing ReReplace(myList,'\|\|','| |','all') but it still throws and error because \6 is never defined.

Any ideas

C

cf_dev2

Inspiring

> but I have fed some values in where the string ends up being Work:Address1:

ThomasWood,

Can you post an example of the full values? Maybe someone else can help with the regex.

T

ThomasWoodAuthor

Participant

Well... After some trial and error I got it working. I had some issues with whitespace, tabs, and new lines not being pulled out with \s \t \n or [:space:]

I'm sure this was my error, but here was my solution anyway. Any suggestions on cleaning this up would be appreciated.

<cfif getMail.RecordCount> My cfpop call
<cfsavecontent variable='emailBody'><cfoutput>#getMail.body#</cfoutput></cfsavecontent>
<cfset noFormat = ReReplace(emailBody,"[^A-Za-z0-9,:@.]+",'\1','all')>
<cfset lobType = ReReplace(noFormat,'[A-Za-z0-9,:@.]+PRODUCTTYPE:(.+)CONTACTCANBEREACHEDAT:Name:(.+),(.+)Home:(.+)Address1:(.+)Work:(.+)Address2:[A-Za-z0-9,:]+ityStZip:(.+),(.+),(.+)Email:(.+)County:(.+)Contact','\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12','all')>

<cfset myList = ListToArray(lobType,'|')>

It needs some cleaning up from my testing, but it works. Thanks for the help

I

insuractive

Inspiring

cf_dev2,
You could trim the spaces and replace the new line characters after you create the list using Trim() and Replace(). You would have to modify the expression a little in order to accomodate the white space characters (I don't think "." matches against spaces or new lines).

The new line character is a bit of a pain, though - Especially if you forget to remove it and then you have an invisible character at the end of your string in your DB. Plus, using "\n" (regExp newline) is a bit easier than #chr(13)##chr(10)# (ASCII/CF New Line)

I

insuractive

Inspiring

Oh, and I should probably mention - I'm not really a regular expression guru. I am, however, a proud downloader of the RegEx Coach, a piece of shareware that makes testing and building regular expressions super easy. Might be a useful tool for anyone else interested in using regular expressions but not wanting to do trial and error testing in CF.

C

cf_dev2

Inspiring

insuractive,

You're right about the newline character. I thought (.) matched any character (including space and newline). But apparently it doesn't : "A period (.) matches any character except newline". So your original solution is more elegant than a regex + find + replace. Thanks for the info :)

T

ThomasWoodAuthor

Participant

Thanks insuractive.

I'll give it a try and let you know how it turns out.

I

insuractive

Inspiring

<cfsavecontent variable="sTest">
Name: Value, Test Home: (800)555-1212
Address1: 98 Test Center Dr Work: (800)5555-1212
</cfsavecontent>

<cfset sPipeDelimList = ReReplace(sTest, "Name:[ ]?(.+)Home:[ ]?(.+)\nAddress1:[ ]?(.+)Work:[ ]?(.+)", "\1|\2|\3|\4", "all")>

<cfdump var="#sPipeDelimList#"><cfabort>

I

insuractive

Inspiring

A little explaination:

Name: - matches the literal text "Name:"
[ ]? - an optional space character
(.+) - matches 1 or more characters (the parenthesis are used to back reference the value later)
\n - new line character

So the expression:

Name:[ ]?(.+)Home:\n

translates into: match any text that starts with a literal "Name:" then an optional space then any text until the literal text "Home:" followed by a new line character

Once you match the values, you can use back references (e.g. \1,\2,\3,etc) to only return the values in parenthesis. In the example I used, I created a pipe (|) delimited file that I could easily process.

Once you check out the expression, the pattern should become evident and you should be able to use it for the entire email.

Hope that helps!

C

cf_dev2

Inspiring

insuractive,

Nice. I forgot about backreferences :) Question, do you need the spaces/new lines in the regex or can you just trim the list values afterwards?

J

joeDangelo

Participating Frequently

Take a look at the documentation for the ReFind function. Youll probably also need to do a bit of research on regular expressions as well, but this should solve your problem.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded