Copy link to clipboard
Copied
Hi all,
Is there a grep expression that is able to handle a break return or other non-text character that might or might not be between a pair of tabs?
I'm using Grep within BBEdit to convert Excel data into tagged XML, to then import into InDesign. However, when someone has put a carriage return within a cell in Excel, my find/replace won't handle that properly. I can use a workaround, but I didn't want to amend the original data if possible.
My expression looks for 7 columns of text, and takes each piece and puts tags around them.
filename | full name | star | company | biog | city | twitter
johnsmith.tif | John Smith | yes | DeLoitte | I work at Deloitte and I'm brilliant. | New York | @myhandle
Find:
^(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)$
Replace:
<person>
<picbox href="file://Headshots/\1" />
<name>\2</name>
<star>\3</star>
<company>\4</company>
<biog>\5</biog>
<city>\6</city>
<twitter>\7</twitter>
</person>
But what if one of those columns has a break return but all the others don't? - Just to be clear, it MIGHT have a carriage return, it MIGHT NOT - this is the tricky bit.
Is there a grep expression that can account for that?
johnsmith.tif | John Smith | yes | DeLoitte | I work at Deloitte
and I'm brilliant. | New York | @myhandle
If anyone can help, that would be amazing.
Thanks,
Justy
Copy link to clipboard
Copied
Hi,
If you want to go greo, the closest thing I can think of is
\t?[^\t]+\t?
But you would have to do some extra cleaning.
Otherwise, I would consider scripting given that you can export excel to CSV.
Copy link to clipboard
Copied
Ooo that's brilliant.
Would you kindly explain what it's doing? If I repeat it 7 times it picks up all the columns needed if I wrap it in ^ and $.
But I then have a problem with my replace string as I can't work out what bit of the expression is the text and so meant to be kept.
My actual, original find and replace strings so you can see what I'm transforming:
Find
^(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)$
Replace
<bounding>\n<picbox href="file://Headshots/\1" />\n<biogbox>\n<name>\2</name>\n<star>\3</star>\n<company>\4</company>\n <bio>\5</bio>\n<citybox>\n<city>\6</city>\n<twitter>\7</twitter>\n</citybox>\n</biogbox>\n</bounding>
Many thanks
Copy link to clipboard
Copied
Hi again,
Sorry to be a pain, I'm trying to understand your grep expression more. Is my breakdown of it correct?
\t?[^\t]+\t?
\t? = \t looks for a tab, ? zero or one time
[^\t]+ = [ start of a pattern, ^ beginning of a line, \t tab, ] end of pattern, + pattern appears one or more times
\t? = (as first line above)
Copy link to clipboard
Copied
To elaborate on Loic's explanation, just a bit, the ^ inside the opening bracket makes the class "negative" so it finds anything except the characters following it up to the closing bracket.
Copy link to clipboard
Copied
Trying to morph a xml file from csv via GREP looks very artistic to me. I am not sure I can provide much more help here.
Copy link to clipboard
Copied
It's not that scary. It's only plain text either side of the original data. Thanks again.
Copy link to clipboard
Copied
Some online tools intend to do such transformations.
Give this a try:
CSV To XML Converter - BeautifyTools.com
And let us know.
Copy link to clipboard
Copied
I've used a similar website to do the same thing. That's when I decided that should the website ever go down, or it needs to be more complex, I need a way to do it offline. That's where grep is helping. I can certainly get it into a workable format. Having imported over 300 biographies in a few minutes. Just hoping to refine my grep expressions so the original text is tampered with as little as possible.
Any errors are then the client's problem, not mine.
When I get the right expression I'll post back.
Copy link to clipboard
Copied
Hi,
Basically it looks for any character other than tab one or more times that can be possibly surrounded by tabs.
So it's
\t? = \t looks for a tab, ? zero or one time
[^\t]+ = any possible character but tab one or more times
\t? = (as first line above)