Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Grep expression to include carriage return if it's there

Contributor ,
Nov 04, 2016 Nov 04, 2016

Hi all,

Is there a grep expression that is able to handle a break return or other non-text character that might or might not be between a pair of tabs?

I'm using Grep within BBEdit to convert Excel data into tagged XML, to then import into InDesign. However, when someone has put a carriage return within a cell in Excel, my find/replace won't handle that properly. I can use a workaround, but I didn't want to amend the original data if possible.

My expression looks for 7 columns of text, and takes each piece and puts tags around them.

filename    |    full name    |    star    |    company    |    biog    |    city    |    twitter

johnsmith.tif    |    John Smith    |    yes    |    DeLoitte    |    I work at Deloitte and I'm brilliant.    |    New York    |    @myhandle

Find:

^(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)$

Replace:

<person>

<picbox href="file://Headshots/\1" />

<name>\2</name>

<star>\3</star>

<company>\4</company>

<biog>\5</biog>

<city>\6</city>

<twitter>\7</twitter>

</person>

But what if one of those columns has a break return but all the others don't? - Just to be clear, it MIGHT have a carriage return, it MIGHT NOT - this is the tricky bit.

Is there a grep expression that can account for that?

johnsmith.tif    |    John Smith    |    yes    |    DeLoitte    |    I work at Deloitte

and I'm brilliant.    |    New York    |    @myhandle

If anyone can help, that would be amazing.

Thanks,

Justy

1.8K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Nov 04, 2016 Nov 04, 2016

Hi,

If you want to go greo, the closest thing I can think of is

\t?[^\t]+\t?

But you would have to do some extra cleaning.

Otherwise, I would consider scripting given that you can export excel to CSV.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 04, 2016 Nov 04, 2016

Ooo that's brilliant.

Would you kindly explain what it's doing? If I repeat it 7 times it picks up all the columns needed if I wrap it in ^ and $.

But I then have a problem with my replace string as I can't work out what bit of the expression is the text and so meant to be kept.

My actual, original find and replace strings so you can see what I'm transforming:

Find

^(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)$

Replace

<bounding>\n<picbox href="file://Headshots/\1" />\n<biogbox>\n<name>\2</name>\n<star>\3</star>\n<company>\4</company>\n         <bio>\5</bio>\n<citybox>\n<city>\6</city>\n<twitter>\7</twitter>\n</citybox>\n</biogbox>\n</bounding>

Many thanks

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 04, 2016 Nov 04, 2016

Hi again,

Sorry to be a pain, I'm trying to understand your grep expression more. Is my breakdown of it correct?

\t?[^\t]+\t?

\t?      = \t looks for a tab, ? zero or one time

[^\t]+  = [ start of a pattern, ^ beginning of a line, \t tab, ] end of pattern, + pattern appears one or more times

\t?      = (as first line above)

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 04, 2016 Nov 04, 2016
LATEST

To elaborate on Loic's explanation, just a bit, the ^ inside the opening bracket makes the class "negative" so it finds anything except the characters following it up to the closing bracket.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Nov 04, 2016 Nov 04, 2016

Trying to morph a xml file from csv via GREP looks very artistic to me. I am not sure I can provide much more help here.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 04, 2016 Nov 04, 2016

It's not that scary. It's only plain text either side of the original data. Thanks again.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Nov 04, 2016 Nov 04, 2016

Some online tools intend to do such transformations.

Give this a try:

CSV To XML Converter - BeautifyTools.com

And let us know.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 04, 2016 Nov 04, 2016

I've used a similar website to do the same thing. That's when I decided that should the website ever go down, or it needs to be more complex, I need a way to do it offline. That's where grep is helping. I can certainly get it into a workable format. Having imported over 300 biographies in a few minutes. Just hoping to refine my grep expressions so the original text is tampered with as little as possible.

Any errors are then the client's problem, not mine.

When I get the right expression I'll post back.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Nov 04, 2016 Nov 04, 2016

Hi,

Basically it looks for any character other than tab one or more times that can be possibly surrounded by tabs.

So it's

\t?      = \t looks for a tab, ? zero or one time

[^\t]+  = any possible character but tab one or more times

\t?      = (as first line above)

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines