PDF export to Word messing up Lao and Khmer text

Question

When exporting Lao and Khmer PDFs to Word, the ligatures appear to break and the text becomes unreadable. Here's an example:

PDF:

Exported Word doc:

This happens in all fonts for these languages. I've been able to find virtually nothing about the cause of this online, except perhaps that the ToUnicode map (whatever that is) isn't being embedded in the PDF when it's exported from InDesign. For reasons I won't get into here, I absolutely have to have these documents in Word, as well as being fully accessible PDF forms with matching layouts. I'm grateful to hear from anyone who's experienced anything like this.

creative explorer · Accepted Answer

@JMHCA the gibberish," is a direct result of the ToUnicode map being either missing or incomplete. Think of a PDF as a book of pictures, not a book of words. When you save a document as a PDF, the program takes a picture of each letter. It gives each picture a secret code, like "picture-1," "picture-2," and so on. The ToUnicode map is the key that translates these secret codes back into real letters. For a simple language like English, this is easy. "picture-1" is "A," "picture-2" is "B," and so on.

But for complex languages like Lao and Khmer, with their special characters and how letters join together, the program often forgets to include this key. When you try to convert the PDF to a Word document, the converter sees the secret codes but doesn't have the key to translate them. It tries to guess, but because it doesn't know what "picture-107" actually is, it just puts out a bunch of random symbols. That's why your text looks like a mess—the converter is flying blind.

In your case, your PDFs were created in InDesign without the "ToUnicode map" feature enabled or correctly embedded. This is a common oversight, as it makes the PDF file size smaller, but it effectively makes the text "un-copyable" and "un-convertible." If by chance do you have the InDesign files that would be the most easiest — When exporting, go to File > Adobe PDF Presets. Choosing a preset like "High Quality Print" or "Press Quality" will almost always embed the necessary font information, including the ToUnicode map, for commercial printing. Also, to guarantee the document is fully accessible and searchable, export it as a PDF/A file. Go to File > Export, and in the dialog box, select the PDF/A standard you want to use (such as PDF/A-1a). This standard specifically requires that all fonts are fully embedded and that character mappings to Unicode are present, which will prevent the text scrambling issue.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded