Identity-H encoding

Question

Hi,I have a severe problem when encoding indic scripts. Here is the matter: I work for a French editor who is publishing either translations or bilingual books of Indian literature (Hindi-French, Bengali-French, Tamil-French for instance with a face-to-face presentation, the Indian text on the left-hand side page and the French in regard, on the even pages).I have subscriptions to Microsoft Office 365 and Adobe Creative Cloud and always update them. I am thus using Word 2016 and Acrobat DC and InDesign, Illustrator, Dreamweaver, etc. 2019, today's lastest versions of these softwares. I am working on PC under Windows 10, equipped with all Indian languages that I use (same for the Office suite where the Hindi, Bengali, Tamil modules are installed, in addition to English and French, my native language).I am used to prepare a neat version of the texts in MS Word, using styles to ease the exportation to InDesign, and I only use OpenType of TrueType fonts from renowned foundries (Adobe, Linotype, Monotype, Microsoft…).Nevertheless, when I create a PDF version of these documents (which contain French, but also, Hindi, Bengali or Tamil texts), the PDF that is generated contains subsets of OpenType fonts but also Identity-H encoded text. This is really very annoying as the PDF becomes not exportable (I tried with a very simple text in Word, exported it to a PDF file and then reexported it from Acrobat DC to MS Word and the result was catastrophic).Also, when I receive PDF files from Indian editors, they are also encoded with this Identity-H encoding and I cannot export them to MS Word, which tremendously complicates my work.Frankly speaking, I don't really understand what this encoding means by the way: I thought it was an old problem when Indic scripts were not standardized, but this no longer the case and the Unicode Consortium has produced since years a very clear encoding norm. So I am very surprised that this problem remains even in modern softwares and, once again, even if you use OpenType fonts.I tried different tunings of PDFMaker for the exportation, forcing for instance PDFMaker to embed the whole font in the PDF document, but the problem remains (the MS Word built-in PDF export module produces the same mess).Does anybody have an explanation to this problem and a solution to offer me?Thanking you in advance,Regards,Pascal Garin

Test Screen Name · Answer

Identity-H is entirely normal and common. It means that the PDF directly uses codes from the font. To extract text when this encoding is used, the PDF also needs a “ToUnicode CMap”. You cannot see if one of these is present.

Exporting from InDesign or using Acrobat PDFMaker for Word should get this right, unless non-Unicode fonts are used. Don’t use such fonts.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.