• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers

After exporting InDesign file to XML, text duplication is found

Explorer ,
Nov 25, 2022 Nov 25, 2022

Copy link to clipboard

Copied

Hello Support Community,

 

I opened an InDesign book and exported each of its files as XML to send to a vendor for translation. However, duplication of text was found in each XML.

 

After I opened the InDesign XMLs in MS word, removed the duplications, and saved the docs as XML, I resent them to the vendor. When the vendor returned the translated copies as MS word docs, I opened and saved them as XML. I did so to import the XML into InDesign; however, no translation appeared in the InDesign file upon import. 

 

I have 2 questions:

 

(1) Is there a way to import the vendor's doc into InDesign for translation to appear?

(2) How can I remove the duplications that appear in each of the InDesign book's files when exported as XML?

 

I've attached one of the InDesign files for your reference. Your help above is appreciated.

 

Ariz

 

TOPICS
How to , Import and export

Views

406

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Nov 28, 2022 Nov 28, 2022

Thanks for your responses; your issue makes a lot more sense to me now. The short answer is "the reason there are dupes in your XML export is becuase there are dupes in the source English XML that I believe was exported from your content management system." Here's a screenshot:

 

dupes.png

 

The only way to get those translations back into your INDD, right now, is by getting a good translation of the XML, with or without dupes, or by manually copying and pasting each string from the Word doc to the InDes

...

Likes

Translate

Translate
Community Expert ,
Nov 25, 2022 Nov 25, 2022

Copy link to clipboard

Copied

I have a few questions, here. I will try to answer yours after I've asked mine.

 

1) You exported XML for translation? Is there a reason why you didn't export IDML? Most translation environments consume IDML without issue.

 

2) I opened up your sample INDD & exported XML, and found no text duplication. I only did an eyeball spot check, though. Can you offer an example string for me to search for?

 

3) So, your post implies that you exported from InDesign and sent the file to your language services provider, who then sent it back complaining of duplicated strings. You then re-sent it to them after editing the XML to manually delete dupes in... MS Word? Did I get that right? 

 

This is a sub-optimal workflow, so far. If I were the LSP, I'd ask for the source files. 

 

4) This happened?!??!?!

When the vendor returned the translated copies as MS word docs, I opened and saved them as XML.

You need a new LSP, yesterday. Supplied with XML and returning MS Word docs to the customer? I mean, maybe you don't have a choice regarding where you send your files for translation, but if you do, your current supplier is a poor choice. 

 

So, on to your questions:

 

(1) Is there a way to import the vendor's doc into InDesign for translation to appear?

Mmmmaybe. Can you post the supplier's Word doc? If your story is accurate, I would imagine that the answer is "no" but I can't say for certain until I've seen what they've supplied to you. 

(2) How can I remove the duplications that appear in each of the InDesign book's files when exported as XML?

I have never seen such duplications, so I can't tell if you'd be better served by changing what you are exporting, or by post-processing whatever it is that you're exporting. My gut feeling is that you'd be better off with an entirely different translation workflow; probably upstream changes to how you are working would serve you better than trying to fix this thing with post-processing. However, if I'm wrong about that, I am going to guess that the answer to your question number two would be "By post-processing with an XSL transform." But I don't suggest that as a solution for your difficulties, because your whole translation workflow needs to be rebuilt, not just this one little fragment. 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 26, 2022 Nov 26, 2022

Copy link to clipboard

Copied

If the service requested XML, I would suspect they rely on some translation software that requires it for input. Not good.

 


| Word & InDesign to Kindle & EPUB: a Pro Guide (Amazon)

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 28, 2022 Nov 28, 2022

Copy link to clipboard

Copied

Thanks so much Joel for your response.

 

Here are my answers to your questions:

  • You exported XML for translation? Is there a reason why you didn't export IDML? Most translation environments consume IDML without issue.

Answer: I took over the process of exporting XML for translation, which was already in place. I am not aware about IDML.

 

  • I opened up your sample INDD & exported XML, and found no text duplication. I only did an eyeball spot check, though. Can you offer an example string for me to search for?

Answer: Yes, the string that gets duplicated is “The copyright of this operations manual is reserved by…….” and “this operations manual contains information and technical drawings, which may…..”. Please see attached.

 

  • So, your post implies that you exported from InDesign and sent the file to your language services provider, who then sent it back complaining of duplicated strings. You then re-sent it to them after editing the XML to manually delete dupes in... MS Word? Did I get that right? 

Answer: Yes, that’s correct.

 

When the vendor returned the translated copies as MS word docs, I opened and saved them as XML.

You need a new LSP, yesterday. Supplied with XML and returning MS Word docs to the customer? I mean, maybe you don't have a choice regarding where you send your files for translation, but if you do, your current supplier is a poor choice. 

 

Answer: I am thinking because I sent the XML saved as word doc, the vendor returned the translated copies as Word doc, please see attached.

 

Hoping to resolve the issue. Thanks again for your attention.

 

Ariz

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 28, 2022 Nov 28, 2022

Copy link to clipboard

Copied

Thanks for your responses; your issue makes a lot more sense to me now. The short answer is "the reason there are dupes in your XML export is becuase there are dupes in the source English XML that I believe was exported from your content management system." Here's a screenshot:

 

dupes.png

 

The only way to get those translations back into your INDD, right now, is by getting a good translation of the XML, with or without dupes, or by manually copying and pasting each string from the Word doc to the InDesign spreads. 

 

Here's the long version:

 

Your organization already had a translation workflow in place, and so I think you were following instructions when you exported XML instead of IDML. That makes sense, especially since it's clear that the EN doc is all tagged up for an XML worklow.  If that's the case, is your vendor a firm (or person) who has done work for your organization before? If so, they're probably used to working with your XML output. So it'd be weird, for them, to get Word instead of XML. They must have seen the dupes in your XML output and said to you "Hey, there are dupes in here! Please fix and resend." At which point the best thing to do would have been to figure out the cause of the dupes and fix them, in order to send them correct XML. 

 

Those duplicated strings are clearly in the English XML. I am not sure how your document came to be, but it sure looks like someone (a tech writer) exported an XML file from some sort of content management system, which was then imported into an InDesign Template, whereupon someone (you?) populated the document with the XML English content. The massive number of unused elements most likely results from the XML-creation stage, long before said XML was imported into InDesign.

 

I suspect that you have instructions on how to get that translated XML back into your English source document, yes? It'd be something like:

a) Click File -> Import XML

b) Choose... I dunno? I suspect that you're supposed to see this menu, which you get by clicking Show XML Import Options at the Import XML file picker dialog. 

JoelCherney_0-1669656783971.png

 

I withdraw my advice that you should find a new provider right away. If you sent them Word docs, then they should reasonably return Word docs. If they've worked with your org before and expect XML, I have an idea about what might have happened on their side, I've been in their shoes before.

 

Me: "Um, hey boss? This is weird. Client sent Word instead of XML this time."

Boss: "Must be the new guy. Did he say anything about it?"

Me: "No, just sent me Word docs without comment."

Boss: "Okay, process 'em and send 'em back as Word docs."

Me: "Isn't that going to break their workflow?"
Boss: "Yup."

Me: "New guy is going to have to copy and paste each string by hand?"

Boss: "Yup."

Me: ".... okay, boss, will do."

 

So, I think that is where you are. You have all the strings in Word, and would have to copy in bits by hand. That will be extremely time-consuming, and error-prone. I imagine you'd have to be very careful to leave each XML tag as-is. That is because I suspect that there is need for that document to have properly tagged XML downstream from you. If I'm wrong, and it's going to print only, then you don't techincally need all of those tags. I can't guess, though.

 

I do think it's clear that you don't need those dupes to be translated. What I'd do is delete the unused content from the Structure panel, re-export XML, and re-send to the vendor. Here's a little GIF showing what I'd do; I would shift-select the dupes and trash 'em all at once. 

 

trash.gif

 

It'd be better if you could go back to the vendor, say "Can you re-apply your translation memory to the fixed XML I've attached here, and deliver the translated XML back to me?" They'd most likely say "Sure, that's just a minimum charge." No new translation should be necessary, if they've already translated that content in your Word file. That would be the best-case scenario (it assumes that they're capturing translations in a translation memory). If you can get that file from your vendor (assuming you can get the money for the vendor to do their jobs again, then it should be simple to Import that XML and Replace the old content with the new content. 

 

(There is a third way, but I don't advise it for you. It's what I would do: I'd fix and re-export my XML, then I would take my source English doc and the vendor's Portuguese Word doc and perform an alignment; put each pair of strings into a translation memory database. Then I'd apply that database to my cleaned XML, and re-import the Portuguese XML into InDesign. However, for that, you kinda have to be a localization nerd who already has some kind of translation memory tool, and the capacity to kinda read Portuguese so you can peform a good alignment. I'd do that, because I am already that localization nerd, and already have the translation tech environment, and I read enough Spanish to feel confident in reading technical Portuguese. That work is what I'm speculating your vendor would want to charge you a minimum charge to perform.)

 

 

 

 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 29, 2022 Nov 29, 2022

Copy link to clipboard

Copied

Thanks again, Joel, for your prompt response, complete with the details and the solution I sought. The GIF and the vendor conversation that you carved in your response impressed me.

 

I have asked the vendor if there is a way to convert their .docx file into a format that InDesign can use.


By the way, I am the tech writer, and my system was down when the vendor informed me that the XMLs had duplication the first time I sent them. So, someone without access to InDesign had to open the XML in MS word, was able to remove the duplicate content, and save the docs as XMLs to send to the vendor. When the vendor got back saying those XMLs were empty, the same person this time saved the XMLs as Word docs and sent them to the vendor, who was happy to receive and return them translated in MS Word.

 

One of your solutions that works for me is to delete the unused content from the Structure Panel.

 

Another solution I found was plugging my file's source content into a similar InDesign file that could previously export the XML without duplication. I had previously prepared this InDesign file for Spanish translation and could successfully import the XML (as you showed) that the vendor provided then. After I tagged my source content in the new InDesign file above and exported the XML, it had no duplication. By taking this step, instead of giving the vendor the edited docx copies, I think I could have avoided the issue.

 

Again, I appreciate the time, energy, and creativity you spent crafting my solution through this forum. I found the depth of InDesign's capabilities from your input. I will keep you posted on the outcome.

 

Thanks again.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 29, 2022 Nov 29, 2022

Copy link to clipboard

Copied

Another solution I found was plugging my file's source content into a similar InDesign file that could previously export the XML without duplication.

 

That makes sense. That's where you are copying and pasting the frames from your spreads into a fresh empty document or template, right? When you do that, only the XML that has been associated with stories, or other on-the-spread elements, get moved to the next document. It feels like more clicks to me, but it's still a totally valid way to purge your XML of unused elements. It was easy, in your sample doc, to handle all umpteen unused elements with a shift-click, as they were in one contiguous mass. Might be harder to do that in other circumstances. 

 

I am still uncertain how you wound up with so many duplicated unused elements. I wonder if it's an InDesign problem, or if it happened somewhere upstream.

 

Lastly, I appreciate your kudos, but I do have to say one last thing for all of the other people who may eventually find this thread. That story I told? It was a true story, but in my entire career, it's the only time I was the direct report of someone who would just, you know, take money from someone who made that kind of mistake. I didn't like it then, and I don't like it now, especially as it seems in this thread that I'm endorsing that kind of "buyer beware" mentality from LSPs or other service providers. I'm not! For what it's worth, speaking only for myself, I would always inform my requestor "Hey, I don't think you meant to do that. Want to maybe submit some XML instead? Shooting yourself in the foot, if you don't."

 

And that boss? Now a senior VP. Sigh. 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 30, 2022 Nov 30, 2022

Copy link to clipboard

Copied

LATEST

What is working best for me currently is this: ".....to handle all umpteen unused elements with a shift-click, as they were in one contiguous mass." I think this is the most appropriate way because even "copying and pasting the frames from spreads into a fresh empty document" contained duplications when exported as XML I was able to remove them by adapting the step you demonstrated previously 🙂

 

That makes me wonder ".. how I wound up with so many duplicated unused elements." 

 

The LSP is yet to respond to my question. 

 

Thanks again.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines