Grep expression to include carriage return if it's there

Report · Nov 04, 2016

Hi all,

Is there a grep expression that is able to handle a break return or other non-text character that might or might not be between a pair of tabs?

I'm using Grep within BBEdit to convert Excel data into tagged XML, to then import into InDesign. However, when someone has put a carriage return within a cell in Excel, my find/replace won't handle that properly. I can use a workaround, but I didn't want to amend the original data if possible.

My expression looks for 7 columns of text, and takes each piece and puts tags around them.

Find:

^(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)$

Replace:

</person>

But what if one of those columns has a break return but all the others don't? - Just to be clear, it MIGHT have a carriage return, it MIGHT NOT - this is the tricky bit.

Is there a grep expression that can account for that?

johnsmith.tif | John Smith | yes | DeLoitte | I work at Deloitte

and I'm brilliant. | New York | @myhandle

If anyone can help, that would be amazing.

Thanks,

Justy

Report · Nov 04, 2016

Hi,

If you want to go greo, the closest thing I can think of is

\t?[^\t]+\t?

But you would have to do some extra cleaning.

Otherwise, I would consider scripting given that you can export excel to CSV.

Report · Nov 04, 2016

Ooo that's brilliant.

Would you kindly explain what it's doing? If I repeat it 7 times it picks up all the columns needed if I wrap it in ^ and $.

But I then have a problem with my replace string as I can't work out what bit of the expression is the text and so meant to be kept.

My actual, original find and replace strings so you can see what I'm transforming:

Find

^(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)$

Replace

Many thanks

Report · Nov 04, 2016

Hi again,

Sorry to be a pain, I'm trying to understand your grep expression more. Is my breakdown of it correct?

\t?[^\t]+\t?

\t? = \t looks for a tab, ? zero or one time

[^\t]+ = [ start of a pattern, ^ beginning of a line, \t tab, ] end of pattern, + pattern appears one or more times

\t? = (as first line above)

Report · Nov 04, 2016

To elaborate on Loic's explanation, just a bit, the ^ inside the opening bracket makes the class "negative" so it finds anything except the characters following it up to the closing bracket.

Report · Nov 04, 2016

Trying to morph a xml file from csv via GREP looks very artistic to me. I am not sure I can provide much more help here.

Report · Nov 04, 2016

It's not that scary. It's only plain text either side of the original data. Thanks again.

Report · Nov 04, 2016

Some online tools intend to do such transformations.

Give this a try:

CSV To XML Converter - BeautifyTools.com

And let us know.

Report · Nov 04, 2016

I've used a similar website to do the same thing. That's when I decided that should the website ever go down, or it needs to be more complex, I need a way to do it offline. That's where grep is helping. I can certainly get it into a workable format. Having imported over 300 biographies in a few minutes. Just hoping to refine my grep expressions so the original text is tampered with as little as possible.

Any errors are then the client's problem, not mine.

When I get the right expression I'll post back.

Report · Nov 04, 2016

Hi,

Basically it looks for any character other than tab one or more times that can be possibly surrounded by tabs.

So it's

\t? = \t looks for a tab, ? zero or one time

[^\t]+ = any possible character but tab one or more times

\t? = (as first line above)

Adobe Community

Grep expression to include carriage return if it's there