Copy link to clipboard
Copied
I have a process where I import a large amount of plain text exported from a database in multiple ways (alphabetically with all contact info, names grouped by city, names grouped by organization, etc.) for a directory. My export from the database includes distinctive bits I can find easily in InDesign with global find/change, so that I can replace them with paragraph styles and character styles. But doing all the find/change operations by hand is slow, so I'm wondering if there is a way to automate it at least a little.
Small example snippet from my geographical listing (names changed for privacy):
#KEN##GEOJ#秋田県#GEONJ# Akita-ken 1
#SHI##GEOJ#横手市#GEONJ# Yokote-shi
SMITH, Cindy
#KEN##GEOJ#青森県#GEONJ# Aomori-ken 20
#SHI##GEOJ#青森市#GEONJ# Aomori-shi
ADAMS, Stephen & Kimiko
JONES, Luke
MCDONALD, Paul & Sara
#KEN# and #SHI# indicate a paragraph style to apply ("Geo prefecture" and "Geo city"), and #GEOJ# and #GEONJ# indicate the beginning and end of a character style ("Geo Japanese").
Yes, I can manually do the necessary operations (text searches for the paragraph style ones and a GREP search for the character style), and I did that for some parts of last year's directory. But the number of needed searches has increased this year, and it will be hard to remember from year to year how to do the searches (especially GREP). Is there a way to automate it? I don't mind scripting, although I'm new to scripting in InDesign.
Where in the process are your pseudo tags: #KEN# and #SHI# injected into the data? If this occurs on the data management side, can the system generate a custom tag?
If so, could the system generate a tag like this: <ParaStyle:address> ?
If so, you could use a Tagged Text Import where the formatting is completed on import.
The best way would be if you style sample text - then Export it as Tagged Text - and then analyse how it looks like.
Tagged Text needs proper encoding:
and header:
<ASCII-WIN>
<Version:19.3><FeatureSet:InDesign-Roman><ColorTable:=<Black:COLOR:CMYK:Process:0,0,0,1>>
<DefineCharStyle:Character Style 1=<Nextstyle:Character Style 1>>
<DefineParaStyle:NormalParagraphStyle=<Nextstyle:NormalParagraphStyle>>
<DefineParaStyle:Paragraph Style 1=<Nextstyle:Paragraph Style 1>>
<ParaStyl
...
Copy link to clipboard
Copied
Where in the process are your pseudo tags: #KEN# and #SHI# injected into the data? If this occurs on the data management side, can the system generate a custom tag?
If so, could the system generate a tag like this: <ParaStyle:address> ?
If so, you could use a Tagged Text Import where the formatting is completed on import.
Copy link to clipboard
Copied
Yes, I have complete control over the pseudo tags - I'm the one writing the code that exports the data from the database. Searching the web for "InDesign Tagged Text Import", the newest relevant thing I found was a PDF for InDesign CS4 that was linked in an answer in this forum conversation, which claims the tag format was still the same as of 2018. So as a first test, I tried it with my simplest section, which uses two paragraph styles and no character styles. My text looks like this (names obfuscated for privacy):
<ParaStyle:Org heading>A3.MP
<ParaStyle:Org listing name>ADAMS, Robert & Linda
<ParaStyle:Org listing name>BARTMAN, George & Barb
<ParaStyle:Org heading>AABFG
<ParaStyle:Org listing name>DAVIS, Bernard
The PDF's instructions for importing the tagged text are the same as any other Place:
But no processing of the tags happens. What step am I missing?
Copy link to clipboard
Copied
The best way would be if you style sample text - then Export it as Tagged Text - and then analyse how it looks like.
Tagged Text needs proper encoding:
and header:
<ASCII-WIN>
<Version:19.3><FeatureSet:InDesign-Roman><ColorTable:=<Black:COLOR:CMYK:Process:0,0,0,1>>
<DefineCharStyle:Character Style 1=<Nextstyle:Character Style 1>>
<DefineParaStyle:NormalParagraphStyle=<Nextstyle:NormalParagraphStyle>>
<DefineParaStyle:Paragraph Style 1=<Nextstyle:Paragraph Style 1>>
<ParaStyle:NormalParagraphStyle>Optati quam sitas simus vollatur a enihill itatquia sit la nus modit, aliquae. <cTypeface:Bold>Nam nobis dolec<cTypeface:>ae paribus ea ent etur? Non repreptatque conecta sperum facessuntus, con numque venem ium auta qui tem latem rae dit, officae. Ut enem. Liae. Ditat.
<ParaStyle:NormalParagraphStyle>Re, ut <CharStyle:Character Style 1>experferist<CharStyle:>, ut eost, explam id qui verunda id ut abor sit landi autessi aut enia ventia nam, omnissit ma porerum duciatem ut eveligenime reptas
<ParaStyle:Paragraph Style 1>ipsapicae. Nequam quo blab inctemposam ex explitia quia doluptatem iusam fugia volut laturit audigendi dollabo remped eveliquae eicia verrum volesserum ut labor abo. Fugiassimet magnis ipsusandi dolo modi
Copy link to clipboard
Copied
Wow, InDesign is picky! After exporting a sample from InDesign as you suggested, I trimmed down the definitions at the beginning to only what InDesign actually needed, which turned out to be just the first line "<UNICODE-WIN>" (I have Japanese in my content, so I can't just use ASCII):
<UNICODE-WIN>
<pstyle:Org heading>AnOrgName
<pstyle:Org listing name>Person
<pstyle:Org heading>AnotherOrg
<pstyle:Org listing name>Person...
But that wasn't enough for InDesign to accept a file generated by my code. Upon closer inspection, I first noticed that the line endings were different - I was just providing LF (simple style, standard in Unix-based OSs), but InDesign's exported file contained CRLF (Windows style, no surprise, since I'm using Windows). I tried changing the "WIN" in the first line to "UNIX", but that had no effect, so I changed my code to use CRLF everywhere. But that still wasn't enough! The last stubborn problem wasn't obvious until I compared the files with more powerful tools, which revealed that the character encoding was different - my database (and therefore the resulting text file by default) is UTF-8, but apparently InDesign won't consider a placed file as Tagged Text unless the encoding is UTF-16 LE with BOM. I had to do a lot of tricky things in my export code (written in PHP, in case anyone reading this is curious) to get it all to play together nicely, but I finally got there.
Copy link to clipboard
Copied
Great.
I warned you at the beginning - proper encoding.
But once you get it - you can do A LOT.
And it imports pretty quickly as well - much faster than if you would have to execute a lot of Find&Change.
Copy link to clipboard
Copied
Yes, you mentioned encoding, but the pulldown in your screenshot (and the resulting first line of exported files) only says "Unicode" vs. other things that are drastically different. UTF-8 is so normal everywhere as a synonym for Unicode that I almost forget there are other flavors: UTF-16 (and even UTF-32), Big Endian vs. Little Endian, and with BOM (not even needed for UTF-8). Anyway, I've now made a little function for outputting text with correctly converted line endings and encoding.
Copy link to clipboard
Copied
I'm sorry if I wasn't precise - I meant checking how file is written by InDesign - with BOM, etc.
Copy link to clipboard
Copied
the 2 options I see, depending on the design you are creating:
The idea is to bring the text in your design and let InDesign take over the formatting because you set it to do it correctly.
Copy link to clipboard
Copied
CSV would be overkill, and it would be challenging to write additions to my export code to create all the right delimiters and escape everything that shouldn't be regarded as a delimiter (commas, quotes, line feeds, etc.).
XML appears similar to the "tagged text" concept @Jeffrey_Smith suggested but more complex. If that's necessary, I'll try it, but I looked through the help page you linked and felt overwhelmed - in fact, I was already lost at the first sentence: "After you import XML data, the imported content appears as elements (the basic building blocks of XML) in the Structure pane." I've never seen a Structure pane (and don't see anything like that in the Window menu) and don't know how I would place a file so that it goes there.
Copy link to clipboard
Copied
It's in VIEW:
But XML also would be overkill.
Try preparing your text and then export as TaggedText - and check contents.