Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Missing ToUnicode After Distillation

New Here ,
Jun 14, 2017 Jun 14, 2017

Hi,

I have some PDFs with Japanese texts. Mostly the texts are in either MS-Mincho or MS-Gothic font. Mostly the fonts are in TrueType (CID) type and Identity-H encoding. I have no problem on copying Japanese texts from Acrobat and pasting them in Notepad. This must be a proof of a proper ToUnicode map in the PDF.

However, when I print the PDF into "Adobe PDF" printer driver, the copy & paste no longer works in the PDF output file. Now the Japanese texts are pasted as missing glyph character (a quotation mark in a square) in Notepad. I guess this means ToUnicode map was not created, most probably during the distillation process.

On the other hand, I have a PDF with Japanese texts whose font is "MSゴシック" or "MS明朝" (these means MS-Gothic and MS-Mincho, respectively), and more importantly, they are in 90ms-RKSJ-H encoding. I can copy and paste Japanese texts from the "Adobe PDF" printed output of this PDF. One thing weird is, the 90ms-RKSJ-H encoding is now changed in Identity-H in the output PDF. To see if the encoding is causing the issue, I printed this output PDF into "Adobe PDF" once again, but only got an error log from Distiller.

I checked a few articles including http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/5411.ToUnicode.pdf  and http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/distfont.pdf but still not sure what the problem is. Do I have to configure or create ToUnicode mapping files? Or is this something cannot be done by the nature of Identity-H encoding? Thank you.

- eellor

TOPICS
Acrobat SDK and JavaScript
1.8K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

LEGEND , Jun 14, 2017 Jun 14, 2017

1. The ability to copy the text does not prove there was a ToUnicode map. 32000-1 gives the method for text extraction, and ToUnicode is only part of this. Also, where these methods do not apply a viewer may use other methods and special knowledge. In fact the CMap 90ms-RKSJ-H has a well defined mapping to Unicode.

2. CMaps may not survive redistilling.

3. ToUnicode cannot survive redistilling. When a PDF is printed, the print mechanism is concerned ONLY with visible entities. NOTHING else is desi

...
Translate
LEGEND ,
Jun 14, 2017 Jun 14, 2017
LATEST

1. The ability to copy the text does not prove there was a ToUnicode map. 32000-1 gives the method for text extraction, and ToUnicode is only part of this. Also, where these methods do not apply a viewer may use other methods and special knowledge. In fact the CMap 90ms-RKSJ-H has a well defined mapping to Unicode.

2. CMaps may not survive redistilling.

3. ToUnicode cannot survive redistilling. When a PDF is printed, the print mechanism is concerned ONLY with visible entities. NOTHING else is designed to be kept, include interactivity and searchability.

4. For these reasons and many others redistilling (also called "refrying") is considered a very poor workflow indeed. Not forbidden but certainly not supported by Adobe, and no longer possible on Mac. Best to find an alternative. If you tell us why you do this activity, we may have a suggestion as to an alternative.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines