Copy link to clipboard
Copied
Hi,
I have a severe problem when encoding indic scripts. Here is the matter: I work for a French editor who is publishing either translations or bilingual books of Indian literature (Hindi-French, Bengali-French, Tamil-French for instance with a face-to-face presentation, the Indian text on the left-hand side page and the French in regard, on the even pages).
I have subscriptions to Microsoft Office 365 and Adobe Creative Cloud and always update them. I am thus using Word 2016 and Acrobat DC and InDesign, Illustrator, Dreamweaver, etc. 2019, today's lastest versions of these softwares. I am working on PC under Windows 10, equipped with all Indian languages that I use (same for the Office suite where the Hindi, Bengali, Tamil modules are installed, in addition to English and French, my native language).
I am used to prepare a neat version of the texts in MS Word, using styles to ease the exportation to InDesign, and I only use OpenType of TrueType fonts from renowned foundries (Adobe, Linotype, Monotype, Microsoft…).
Nevertheless, when I create a PDF version of these documents (which contain French, but also, Hindi, Bengali or Tamil texts), the PDF that is generated contains subsets of OpenType fonts but also Identity-H encoded text. This is really very annoying as the PDF becomes not exportable (I tried with a very simple text in Word, exported it to a PDF file and then reexported it from Acrobat DC to MS Word and the result was catastrophic).
Also, when I receive PDF files from Indian editors, they are also encoded with this Identity-H encoding and I cannot export them to MS Word, which tremendously complicates my work.
Frankly speaking, I don't really understand what this encoding means by the way: I thought it was an old problem when Indic scripts were not standardized, but this no longer the case and the Unicode Consortium has produced since years a very clear encoding norm. So I am very surprised that this problem remains even in modern softwares and, once again, even if you use OpenType fonts.
I tried different tunings of PDFMaker for the exportation, forcing for instance PDFMaker to embed the whole font in the PDF document, but the problem remains (the MS Word built-in PDF export module produces the same mess).
Does anybody have an explanation to this problem and a solution to offer me?
Thanking you in advance,
Regards,
Pascal Garin
Copy link to clipboard
Copied
Identity-H is entirely normal and common. It means that the PDF directly uses codes from the font. To extract text when this encoding is used, the PDF also needs a “ToUnicode CMap”. You cannot see if one of these is present.
Exporting from InDesign or using Acrobat PDFMaker for Word should get this right, unless non-Unicode fonts are used. Don’t use such fonts.
Copy link to clipboard
Copied
To extract text when this encoding is used, the PDF also needs a “ToUnicode CMap”. You cannot see if one of these is present.
By @Test Screen Name
Actually, you can if you use this tool: http://brendandahl.github.io/pdf.js.utils/browser/
Copy link to clipboard
Copied
Hello,
I found the following thread useful:
Copy link to clipboard
Copied