• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Identity-H encoding

Explorer ,
Mar 20, 2019 Mar 20, 2019

Copy link to clipboard

Copied

Hi,

I have a severe problem when encoding indic scripts. Here is the matter: I work for a French editor who is publishing either translations or bilingual books of Indian literature (Hindi-French, Bengali-French, Tamil-French for instance with a face-to-face presentation, the Indian text on the left-hand side page and the French in regard, on the even pages).

I have subscriptions to Microsoft Office 365 and Adobe Creative Cloud and always update them. I am thus using Word 2016 and Acrobat DC and InDesign, Illustrator, Dreamweaver, etc. 2019, today's lastest versions of these softwares. I am working on PC under Windows 10, equipped with all Indian languages that I use (same for the Office suite where the Hindi, Bengali, Tamil modules are installed, in addition to English and French, my native language).

I am used to prepare a neat version of the texts in MS Word, using styles to ease the exportation to InDesign, and I only use OpenType of TrueType fonts from renowned foundries (Adobe, Linotype, Monotype, Microsoft…).

Nevertheless, when I create a PDF version of these documents (which contain French, but also, Hindi, Bengali or Tamil texts), the PDF that is generated contains subsets of OpenType fonts but also Identity-H encoded text. This is really very annoying as the PDF becomes not exportable (I tried with a very simple text in Word, exported it to a PDF file and then reexported it from Acrobat DC to MS Word and the result was catastrophic).

Also, when I receive PDF files from Indian editors, they are also encoded with this Identity-H encoding and I cannot export them to MS Word, which tremendously complicates my work.

Frankly speaking, I don't really understand what this encoding means by the way: I thought it was an old problem when Indic scripts were not standardized, but this no longer the case and the Unicode Consortium has produced since years a very clear encoding norm. So I am very surprised that this problem remains even in modern softwares and, once again, even if you use OpenType fonts.

I tried different tunings of PDFMaker for the exportation, forcing for instance PDFMaker to embed the whole font in the PDF document, but the problem remains (the MS Word built-in PDF export module produces the same mess).

Does anybody have an explanation to this problem and a solution to offer me?

Thanking you in advance,

Regards,
Pascal Garin

TOPICS
Create PDFs

Views

24.4K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 20, 2019 Mar 20, 2019

Copy link to clipboard

Copied

Identity-H is entirely normal and common. It means that the PDF directly uses codes from the font. To extract text when this encoding is used, the PDF also needs a “ToUnicode CMap”. You cannot see if one of these is present.

Exporting from InDesign or using Acrobat PDFMaker for Word should get this right, unless non-Unicode fonts are used. Don’t use such fonts.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 27, 2021 Oct 27, 2021

Copy link to clipboard

Copied

LATEST
quote

To extract text when this encoding is used, the PDF also needs a “ToUnicode CMap”. You cannot see if one of these is present.


By @Test Screen Name

 

Actually, you can if you use this tool: http://brendandahl.github.io/pdf.js.utils/browser/ 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2019 Sep 15, 2019

Copy link to clipboard

Copied

Hello,

 

I found the following thread useful:

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2019 Sep 15, 2019

Copy link to clipboard

Copied

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines