Skip to main content
Participant
August 27, 2025
Answered

PDF export to Word messing up Lao and Khmer text

  • August 27, 2025
  • 1 reply
  • 356 views

When exporting Lao and Khmer PDFs to Word, the ligatures appear to break and the text becomes unreadable. Here's an example:

 

PDF:

 

Exported Word doc:

 

This happens in all fonts for these languages. I've been able to find virtually nothing about the cause of this online, except perhaps that the ToUnicode map (whatever that is) isn't being embedded in the PDF when it's exported from InDesign. For reasons I won't get into here, I absolutely have to have these documents in Word, as well as being fully accessible PDF forms with matching layouts. I'm grateful to hear from anyone who's experienced anything like this.

Correct answer creative explorer

@JMHCA the gibberish," is a direct result of the ToUnicode map being either missing or incomplete. Think of a PDF as a book of pictures, not a book of words. When you save a document as a PDF, the program takes a picture of each letter. It gives each picture a secret code, like "picture-1," "picture-2," and so on. The ToUnicode map is the key that translates these secret codes back into real letters. For a simple language like English, this is easy. "picture-1" is "A," "picture-2" is "B," and so on.

But for complex languages like Lao and Khmer, with their special characters and how letters join together, the program often forgets to include this key. When you try to convert the PDF to a Word document, the converter sees the secret codes but doesn't have the key to translate them. It tries to guess, but because it doesn't know what "picture-107" actually is, it just puts out a bunch of random symbols. That's why your text looks like a mess—the converter is flying blind. 

In your case, your PDFs were created in InDesign without the "ToUnicode map" feature enabled or correctly embedded. This is a common oversight, as it makes the PDF file size smaller, but it effectively makes the text "un-copyable" and "un-convertible." If by chance do you have the InDesign files that would be the most easiest — When exporting, go to File > Adobe PDF Presets. Choosing a preset like "High Quality Print" or "Press Quality" will almost always embed the necessary font information, including the ToUnicode map, for commercial printing. Also, to guarantee the document is fully accessible and searchable, export it as a PDF/A file. Go to File > Export, and in the dialog box, select the PDF/A standard you want to use (such as PDF/A-1a). This standard specifically requires that all fonts are fully embedded and that character mappings to Unicode are present, which will prevent the text scrambling issue.

1 reply

creative explorer
Community Expert
creative explorerCommunity ExpertCorrect answer
Community Expert
August 27, 2025

@JMHCA the gibberish," is a direct result of the ToUnicode map being either missing or incomplete. Think of a PDF as a book of pictures, not a book of words. When you save a document as a PDF, the program takes a picture of each letter. It gives each picture a secret code, like "picture-1," "picture-2," and so on. The ToUnicode map is the key that translates these secret codes back into real letters. For a simple language like English, this is easy. "picture-1" is "A," "picture-2" is "B," and so on.

But for complex languages like Lao and Khmer, with their special characters and how letters join together, the program often forgets to include this key. When you try to convert the PDF to a Word document, the converter sees the secret codes but doesn't have the key to translate them. It tries to guess, but because it doesn't know what "picture-107" actually is, it just puts out a bunch of random symbols. That's why your text looks like a mess—the converter is flying blind. 

In your case, your PDFs were created in InDesign without the "ToUnicode map" feature enabled or correctly embedded. This is a common oversight, as it makes the PDF file size smaller, but it effectively makes the text "un-copyable" and "un-convertible." If by chance do you have the InDesign files that would be the most easiest — When exporting, go to File > Adobe PDF Presets. Choosing a preset like "High Quality Print" or "Press Quality" will almost always embed the necessary font information, including the ToUnicode map, for commercial printing. Also, to guarantee the document is fully accessible and searchable, export it as a PDF/A file. Go to File > Export, and in the dialog box, select the PDF/A standard you want to use (such as PDF/A-1a). This standard specifically requires that all fonts are fully embedded and that character mappings to Unicode are present, which will prevent the text scrambling issue.

m
JMHCAAuthor
Participant
August 27, 2025

Thank you so much! If this works, you'll be a lifesaver. One thing: when I go to export, the PDF/A standard is unavailable—I'm only seeing the PDF/X option. Do I need to adjust another setting to get the PDF/A standard?

 

JMHCAAuthor
Participant
September 4, 2025

Learned from Adobe Support that the only way to generate a PDF/A from InDesign is with the Adobe PDF printer. Hopefully that helps anyone else who has this issue!