Skip to main content
Known Participant
May 12, 2021
Question

Thai language export to PDF vs Print to PDF, diacritics are garbled

  • May 12, 2021
  • 8 replies
  • 13127 views

Helping out a customer here with some Thai language issues.

I have a very simple file containing just a few Thai words (see attached).

If I use "Export" in InDesign 16.2.1, the PDF is visually fine but the diacritics ("accents") are garbled and/or duplicated if you try to copy/paste.

If I use "Print to Adobe PDF" instead, the file is perfect.

Of course, the simple solution could in theory be to print to PDF, but if other parts of the artwork workflow relies on Export, we don't really want to go there.

Is there a solution to this or is this just how InDesign works? Thinking it probably has similar issues with other scripts....

thanks

Johan

This topic has been closed for replies.

8 replies

Known Participant
February 23, 2023

The latest update from Adobe is that they want to close my ticket without an ETA on solution, nor do they want to tell us the root cause. And this is almost 2 years since it was reported. Frustrating.

Participant
June 2, 2022

We (the DTP department in a translation agency) also got issues with Thai encoding in PDFs exported from InDesign 2021 (v16).
Just like thread starter Johan we can't use "Print to Adobe PDF" as a workaround because we are working on interactive documents which contain form fields and buttons.
Laubender mentioned that the bug would not exist in INDD2019, but our export to PDF from INDD2019 (v14.0.3) still shows issues, even if not the complete mess we get from the higher versions.

 

So, here we go, in this post everything refers to INDD2021:

Text encoding is fine in the INDD, compare screenshot 1.
I can properly copy text to my clipboard and place it in a TXT editor. If in the first line one of three Unicode characters gets displayed separately from the rest of the abugida, this seems ok and only got to to with how the editor decides about text placement in this line. In the last line the same three characters ('THAI CHARACTER SO SUA' + vowel mark + tone mark; U+0E2A + U+0E37 + U+0E48) combine well to properly form the Thai abugida in question.
Similar case further to the right, and the rest of the text is completely unconspicuous as far as I can tell.

screenshot 1)

 

 

 

 

 

 

 

 

 

 

 

 

But in the PDF we exported from that very INDD that is no more the case. On the surface it looks fine, but again copying over to a clean text format reveals quite a mess, see screenshot 2.

We now got multiplying tone and vowel marks all over the place (compare the purple boxes), plus empty squares and black diamond question marks which indicate invalid or replacement characters and also some extra empty spaces thrown in between. No good at all.
For a more detailed look, here I chose the first complex (and at once showing a broken encoding) abugida to place on the last line (in the red boxes). First it properly displays the three parts it is supposed to contain: 'THAI CHARACTER RO RUA' + vowel mark + tone mark (U+0E23 + U+0E37 + U+0E48), but then it is followed by unwanted stuff like the characters U+DBC0 and U+DEE0), then adding the same tone mark another time (U+0E48).
Same story with the character we already copied in the first screenshot (orange boxes). After the three correct Unicode characters which already form the abugida (U+0E2A + U+0E37 + U+0E48), we also receive the same unwanted trio (U+DBC0 + U+DEE0 + U+0E48) as above.

screenshot 2)

 

 

 

 

 

 

 

 

 

 

 

 

Our linguists need to be able to properly work with the text in the PDFs, that is to copy the text in its proper encoding over to comment boxes and other tools. We also want to deliver properly working PDFs to our client. Obviously with these PDFs we can't do that.

I will vote in the uservoice post provided by Johan and soon add some more details regarding the issues we see in Thai-PDFs exported from INDD2019 .

Known Participant
June 2, 2022

For better result in 2019 I would try to use a different font. I have tried various fonts in 2019 but not all of them will work with Thai diacritics (in InDesign, in Illustrator it works fine even in latest versions).

Community Expert
August 10, 2021

Hi Johan,

I'm not sure if that error is only with Thai script.

See this discussion:

 

Additional Character spaces added when viewed as text
Debbie @ Bella, Aug 08, 2021

https://community.adobe.com/t5/indesign/additional-character-spaces-added-when-viewed-as-text/td-p/12303013

 

The application where the text is copied from is not Adobe Reader or Acrobat Pro DC, but a " digital host (Zinio)." ( whatever that is. ) Nevertheless both issues could be related.

 

Regards,
Uwe Laubender

( ACP )

Known Participant
August 10, 2021

OK this I havent seen. For many years we have had code to remove such spaces in our proofreading tool (I work for a supplier of automated proofreading software) but I have not heard of any problems recently. There have also over the years been issues with "All Caps" but this I think they have solved.

Community Expert
August 10, 2021

I can also confirm that the bug is not with InDesign CC 2019 version 14.0.3.433 on Windows 10.

However I can detect it also with InDesign 2021 2020 version 15.1.3.302.

 

Regards,
Uwe Laubender

( ACP )

 

EDITED Typo: I corrected "2021" to "2020" now.

Known Participant
August 10, 2021

Thanks Uwe, yes this is my experience too. 

Something broke with CC 2020. A bit weird that it only seems to affect Thai script.
Community Expert
August 10, 2021

Also tried the following workflow with InDesign 2021 on Windows 10:

 

Print to PostScript > distill to PDF

Copy text from PDF

 

Straight away that did not work.

One error A couple of errors in the result sneaked in. See screenshot below.

 

What worked without flaw:

Switched to the Edit PDF workspace in Acrobat Pro DC and copied from that.

 

 

Regards,
Uwe Laubender

( ACP )

 

EDITED: Added a different screenshot with the error marked.

Known Participant
August 10, 2021

Interesting.

Community Expert
August 10, 2021

Hi Johan,

I see the issue with the exported PDF as well.

Bug confirmed.

 

What I did:

Opened your "thai test export.pdf" in Acrobat Pro on my Windows 10 machine.

Copied the text to the clipboard, turned to your InDesign document where I activated the font Adobe Thai Regular from Adobe fonts and pasted the text in a text frame that was formatted with Adobe Thai Regular.

 

Did the same with "thai test print.pdf" in Acrobat Pro on my Windows 10 machine.

 

No issues with "thai test print.pdf", three different results, all wrong, with "thai test export.pdf":

 

 

Voted and commented at InDesign UserVoice.

 

Regards,
Uwe Laubender

( ACP )

Peter Spier
Community Expert
Community Expert
May 12, 2021

I'm not seeing any obvious difference in the two PDFs either...

 

What are you using to view them?

Known Participant
May 17, 2021

Thanks for your comment. Yes, visually they are the same. But if you try and copy/paste the Thai text from them e.g. in Adobe reader you will get different results. This leads to other problems further down the line (e.g. when indexing the content or using proofreading tools).

 

Participant
August 4, 2021

Hi Johan

Just wondering if you got any resolution in this? We see the same problem though we are doing something slightly different. We have to format in InDesign from a supplied Thai word file and then double check teh PDF we export against the original Word file. 

In older versions of InDesign this worked fine but in 2019/2021 the order of the chararacters/diacritis in the PDF file are pretty much screwed up which I thikn is the same as what you are seeing. Visually it looks OK but when you cope the text (or for us we run an electronic comparison) the order is not correct.

Regards

Michael

Ian Sayers
Inspiring
May 12, 2021

i might be missing something... all of these files look the exact same and if i copy/paste it also works fine. I don't have Leelawadee font but from what i can see it looks fine...

 

Can you record your screen so we can see this more clearly please ?

Thanks

Ian

Known Participant
May 17, 2021

added video recording here