Copy link to clipboard
Copied
When exporting Lao and Khmer PDFs to Word, the ligatures appear to break and the text becomes unreadable. Here's an example:
PDF:
Exported Word doc:
This happens in all fonts for these languages. I've been able to find virtually nothing about the cause of this online, except perhaps that the ToUnicode map (whatever that is) isn't being embedded in the PDF when it's exported from InDesign. For reasons I won't get into here, I absolutely have to have these documents in Word, as well as being fully accessible PDF forms with matching layouts. I'm grateful to hear from anyone who's experienced anything like this.
Copy link to clipboard
Copied
@JMHCA the gibberish," is a direct result of the ToUnicode map being either missing or incomplete. Think of a PDF as a book of pictures, not a book of words. When you save a document as a PDF, the program takes a picture of each letter. It gives each picture a secret code, like "picture-1," "picture-2," and so on. The ToUnicode map is the key that translates these secret codes back into real letters. For a simple language like English, this is easy. "picture-1" is "A," "picture-2" is "B," and so on.
But for complex languages like Lao and Khmer, with their special characters and how letters join together, the program often forgets to include this key. When you try to convert the PDF to a Word document, the converter sees the secret codes but doesn't have the key to translate them. It tries to guess, but because it doesn't know what "picture-107" actually is, it just puts out a bunch of random symbols. That's why your text looks like a mess—the converter is flying blind.
In your case, your PDFs were created in InDesign without the "ToUnicode map" feature enabled or correctly embedded. This is a common oversight, as it makes the PDF file size smaller, but it effectively makes the text "un-copyable" and "un-convertible." If by chance do you have the InDesign files that would be the most easiest — When exporting, go to File > Adobe PDF Presets. Choosing a preset like "High Quality Print" or "Press Quality" will almost always embed the necessary font information, including the ToUnicode map, for commercial printing. Also, to guarantee the document is fully accessible and searchable, export it as a PDF/A file. Go to File > Export, and in the dialog box, select the PDF/A standard you want to use (such as PDF/A-1a). This standard specifically requires that all fonts are fully embedded and that character mappings to Unicode are present, which will prevent the text scrambling issue.
Copy link to clipboard
Copied
@JMHCA the gibberish," is a direct result of the ToUnicode map being either missing or incomplete. Think of a PDF as a book of pictures, not a book of words. When you save a document as a PDF, the program takes a picture of each letter. It gives each picture a secret code, like "picture-1," "picture-2," and so on. The ToUnicode map is the key that translates these secret codes back into real letters. For a simple language like English, this is easy. "picture-1" is "A," "picture-2" is "B," and so on.
But for complex languages like Lao and Khmer, with their special characters and how letters join together, the program often forgets to include this key. When you try to convert the PDF to a Word document, the converter sees the secret codes but doesn't have the key to translate them. It tries to guess, but because it doesn't know what "picture-107" actually is, it just puts out a bunch of random symbols. That's why your text looks like a mess—the converter is flying blind.
In your case, your PDFs were created in InDesign without the "ToUnicode map" feature enabled or correctly embedded. This is a common oversight, as it makes the PDF file size smaller, but it effectively makes the text "un-copyable" and "un-convertible." If by chance do you have the InDesign files that would be the most easiest — When exporting, go to File > Adobe PDF Presets. Choosing a preset like "High Quality Print" or "Press Quality" will almost always embed the necessary font information, including the ToUnicode map, for commercial printing. Also, to guarantee the document is fully accessible and searchable, export it as a PDF/A file. Go to File > Export, and in the dialog box, select the PDF/A standard you want to use (such as PDF/A-1a). This standard specifically requires that all fonts are fully embedded and that character mappings to Unicode are present, which will prevent the text scrambling issue.
Copy link to clipboard
Copied
Thank you so much! If this works, you'll be a lifesaver. One thing: when I go to export, the PDF/A standard is unavailable—I'm only seeing the PDF/X option. Do I need to adjust another setting to get the PDF/A standard?
Copy link to clipboard
Copied
Learned from Adobe Support that the only way to generate a PDF/A from InDesign is with the Adobe PDF printer. Hopefully that helps anyone else who has this issue!
Copy link to clipboard
Copied
Hello @JMHCA
Thank you for sharing the steps. For future reference, you can check these Adobe articles for the steps:
PDF/X-, PDF/A-, and PDF/E-compliant files (Acrobat Pro).
How to convert a PDF to a PDF/A.
How to export InDesign Book (indb) to PDF/A.
Thanks,
Anand Sri | Acrobat Community Team
Find more inspiration, events, and resources on the new Adobe Community
Explore Now