Copy link to clipboard
Copied
Helping out a customer here with some Thai language issues.
I have a very simple file containing just a few Thai words (see attached).
If I use "Export" in InDesign 16.2.1, the PDF is visually fine but the diacritics ("accents") are garbled and/or duplicated if you try to copy/paste.
If I use "Print to Adobe PDF" instead, the file is perfect.
Of course, the simple solution could in theory be to print to PDF, but if other parts of the artwork workflow relies on Export, we don't really want to go there.
Is there a solution to this or is this just how InDesign works? Thinking it probably has similar issues with other scripts....
thanks
Johan
Copy link to clipboard
Copied
i might be missing something... all of these files look the exact same and if i copy/paste it also works fine. I don't have Leelawadee font but from what i can see it looks fine...
Can you record your screen so we can see this more clearly please ?
Thanks
Ian
Copy link to clipboard
Copied
added video recording here
Copy link to clipboard
Copied
I'm not seeing any obvious difference in the two PDFs either...
What are you using to view them?
Copy link to clipboard
Copied
Thanks for your comment. Yes, visually they are the same. But if you try and copy/paste the Thai text from them e.g. in Adobe reader you will get different results. This leads to other problems further down the line (e.g. when indexing the content or using proofreading tools).
Copy link to clipboard
Copied
Hi Johan
Just wondering if you got any resolution in this? We see the same problem though we are doing something slightly different. We have to format in InDesign from a supplied Thai word file and then double check teh PDF we export against the original Word file.
In older versions of InDesign this worked fine but in 2019/2021 the order of the chararacters/diacritis in the PDF file are pretty much screwed up which I thikn is the same as what you are seeing. Visually it looks OK but when you cope the text (or for us we run an electronic comparison) the order is not correct.
Regards
Michael
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Hi Johan
So did you ever get to the bottom of what the problem is? Is the composition engine in the the newer version of InDesign just not abel to handle Thai properly?
Regards
Michael
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Hi Johan
Is this Johan from Informa?
Regards
Michael
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Ahha - Michael he in Perigord
I'll email you directly as this has some relevance to one of your customers
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Hi Johan,
I see the issue with the exported PDF as well.
Bug confirmed.
What I did:
Opened your "thai test export.pdf" in Acrobat Pro on my Windows 10 machine.
Copied the text to the clipboard, turned to your InDesign document where I activated the font Adobe Thai Regular from Adobe fonts and pasted the text in a text frame that was formatted with Adobe Thai Regular.
Did the same with "thai test print.pdf" in Acrobat Pro on my Windows 10 machine.
No issues with "thai test print.pdf", three different results, all wrong, with "thai test export.pdf":
Voted and commented at InDesign UserVoice.
Regards,
Uwe Laubender
( ACP )
Copy link to clipboard
Copied
Also tried the following workflow with InDesign 2021 on Windows 10:
Print to PostScript > distill to PDF
Copy text from PDF
Straight away that did not work.
One error A couple of errors in the result sneaked in. See screenshot below.
What worked without flaw:
Switched to the Edit PDF workspace in Acrobat Pro DC and copied from that.
Regards,
Uwe Laubender
( ACP )
EDITED: Added a different screenshot with the error marked.
Copy link to clipboard
Copied
Interesting.
Copy link to clipboard
Copied
I can also confirm that the bug is not with InDesign CC 2019 version 14.0.3.433 on Windows 10.
However I can detect it also with InDesign 2021 2020 version 15.1.3.302.
Regards,
Uwe Laubender
( ACP )
EDITED Typo: I corrected "2021" to "2020" now.
Copy link to clipboard
Copied
Thanks Uwe, yes this is my experience too.
Copy link to clipboard
Copied
Hi Johan,
I'm not sure if that error is only with Thai script.
See this discussion:
Additional Character spaces added when viewed as text
Debbie @ Bella, Aug 08, 2021
The application where the text is copied from is not Adobe Reader or Acrobat Pro DC, but a " digital host (Zinio)." ( whatever that is. ) Nevertheless both issues could be related.
Regards,
Uwe Laubender
( ACP )
Copy link to clipboard
Copied
OK this I havent seen. For many years we have had code to remove such spaces in our proofreading tool (I work for a supplier of automated proofreading software) but I have not heard of any problems recently. There have also over the years been issues with "All Caps" but this I think they have solved.
Copy link to clipboard
Copied
We (the DTP department in a translation agency) also got issues with Thai encoding in PDFs exported from InDesign 2021 (v16).
Just like thread starter Johan we can't use "Print to Adobe PDF" as a workaround because we are working on interactive documents which contain form fields and buttons.
Laubender mentioned that the bug would not exist in INDD2019, but our export to PDF from INDD2019 (v14.0.3) still shows issues, even if not the complete mess we get from the higher versions.
So, here we go, in this post everything refers to INDD2021:
Text encoding is fine in the INDD, compare screenshot 1.
I can properly copy text to my clipboard and place it in a TXT editor. If in the first line one of three Unicode characters gets displayed separately from the rest of the abugida, this seems ok and only got to to with how the editor decides about text placement in this line. In the last line the same three characters ('THAI CHARACTER SO SUA' + vowel mark + tone mark; U+0E2A + U+0E37 + U+0E48) combine well to properly form the Thai abugida in question.
Similar case further to the right, and the rest of the text is completely unconspicuous as far as I can tell.
screenshot 1)
But in the PDF we exported from that very INDD that is no more the case. On the surface it looks fine, but again copying over to a clean text format reveals quite a mess, see screenshot 2.
We now got multiplying tone and vowel marks all over the place (compare the purple boxes), plus empty squares and black diamond question marks which indicate invalid or replacement characters and also some extra empty spaces thrown in between. No good at all.
For a more detailed look, here I chose the first complex (and at once showing a broken encoding) abugida to place on the last line (in the red boxes). First it properly displays the three parts it is supposed to contain: 'THAI CHARACTER RO RUA' + vowel mark + tone mark (U+0E23 + U+0E37 + U+0E48), but then it is followed by unwanted stuff like the characters U+DBC0 and U+DEE0), then adding the same tone mark another time (U+0E48).
Same story with the character we already copied in the first screenshot (orange boxes). After the three correct Unicode characters which already form the abugida (U+0E2A + U+0E37 + U+0E48), we also receive the same unwanted trio (U+DBC0 + U+DEE0 + U+0E48) as above.
screenshot 2)
Our linguists need to be able to properly work with the text in the PDFs, that is to copy the text in its proper encoding over to comment boxes and other tools. We also want to deliver properly working PDFs to our client. Obviously with these PDFs we can't do that.
I will vote in the uservoice post provided by Johan and soon add some more details regarding the issues we see in Thai-PDFs exported from INDD2019 .
Copy link to clipboard
Copied
For better result in 2019 I would try to use a different font. I have tried various fonts in 2019 but not all of them will work with Thai diacritics (in InDesign, in Illustrator it works fine even in latest versions).
Copy link to clipboard
Copied
What would be great would be some guidance from Adobe how to work around this. I have been in a few calls but they are not communicating.
Copy link to clipboard
Copied
"Laubender mentioned that the bug would not exist in INDD2019, but our export to PDF from INDD2019 (v14.0.3) still shows issues, even if not the complete mess we get from the higher versions."
Hi Linienstraße,
could be that some glyphs that I did not test are not working with my InDesign version 14.0.3 on Windows 10.
Test Johan's sample document if you see any errors with 14.0.3. What could be also important: I only did a test with PDF Export (Print) and not with PDF Export (Interactive).
You could be OK with the following strategy:
[1] Export to PDF Export (Print) to PDF document "A".
[2] Export to PDF Export (Interactive) to PDF document "B".
Open PDF "B" in Acrobat Pro and exchange all pages with the ones from "A" from step [1].
The interactive elements will be maintained.
Regards,
Uwe Laubender
( Adobe Community Professional )
Copy link to clipboard
Copied
Hello Johan and Laubender, thank you for the quick replies.
About using different fonts: for Thai, we are working with „Sarabun“, „Neue Frutiger Thai“ and „Noto Sans Thai“.
In the INDD unsurprisingly the text encoding is always clean, only after exporting to PDF the problems develop and then indeed differently for different fonts. But so far no font seems to work without fail. If you could recommend a looped style Thai font, which produces reliable results (at least in INDD2019), it would be very welcome.
But now more details from our file:
1) First I converted it to INDD2019, font is still „Sarabun“, export method is still „interactive PDF“.
(I also exported this as a „print PDF“ but the results in both PDF and TXT file were identical to the „interactive PDF“ export, showing the same issues at the same position, so that setting doesn’t seem to be a factor.)
Here we still got 3 issues in 2 categories
case 1: line 2 (orange box)
PDF export messes with the sequence of the Unicode characters (THAI CHARACTER SARA UEE now appears over the wrong abugida) and adds a space:
INDD: \u0e2a\u0e37\u0e48\u0e2d\u0e2a\u0e32
PDF: \u0e2a\u0e48\u0e2d\u0e37 \u0e2a\u0e32
also case 1: line 4
INDD: \u0e40\u0e1e\u0e37\u0e48\u0e2d\u0e1b
PDF: \u0e40\u0e1e\u0e48\u0e2d\u0e37 \u0e1b
case 2: line 4 (red box)
PDF export adds invalid characters U+DBC0 + U+DEF8
INDD: \u0e23\u0e17\u0e33\u0e07
PDF: \u0e23\u0e17\udbc0\udef8\u0e33\u0e07
2) After switching the font to „Neue Frutiger Thai“ and converting to INDD2019 we are down to only one issue, exactly the same case 2 (line 4, red box) from above.
3) But with „Noto Sans Thai“ (INDD2019) again quite a few problems are cropping up in the PDF:
case 1b: line 2 (purple box)
Variant of case 1, now two characters move to the wrong position, then a space gets added
INDD: \u0e40\u0e0a\u0e35\u0e48\u0e22\u0e27
PDF: \u0e40\u0e0a\u0e22\u0e35\u0e48 \u0e27
also case1b: line 4
INDD: \u0e07\u0e1c\u0e39\u0e49\u0e2d\u0e37\u0e48\u0e19
PDF: \u0e07\u0e1c\u0e2d\u0e39\u0e49 \u0e37\u0e48\u0e19
case 3, line 2 (blue box)
PDF export messes with the sequence of Unicode characters, two diacritics move to the wrong abugida, but no space is added.
INDD: \u0e30\u0e1c\u0e1d\u0e39\u0e36\u0e49\u0e01
PDF: \u0e30\u0e1c\u0e39\u0e49\u0e1d\u0e36\u0e01
case 2 (line 4, red box) seems to be wrong all the time, no matter which font is being applied
Also many thanks to Laubender for the idea to assemble the final PDF from two exports, that might indeed work, event though not out of the box.
Here we need to replace step [1] with a „Print to“ -> Adobe PDF, because „Export as“-> Print PDF still gives us the same encoding issues.
... and then the placement of elements is a little off compared to the „Export as“->Interactive PDF. But this looks fixable.
And it is probably a matter of luck that the button texts (from the „Interactive PDF“) stay in proper encoding, they are usually short which should help, at least in my current test project they did.
Still, Adobe should make some effort to fix these issues. When exporting to PDF preservation of the proper encoding should not depend on workarounds.