Thai language export to PDF vs Print to PDF, diacritics are garbled

New Here ,
May 12, 2021 May 12, 2021

Copy link to clipboard

Copied

Helping out a customer here with some Thai language issues.

I have a very simple file containing just a few Thai words (see attached).

If I use "Export" in InDesign 16.2.1, the PDF is visually fine but the diacritics ("accents") are garbled and/or duplicated if you try to copy/paste.

If I use "Print to Adobe PDF" instead, the file is perfect.

Of course, the simple solution could in theory be to print to PDF, but if other parts of the artwork workflow relies on Export, we don't really want to go there.

Is there a solution to this or is this just how InDesign works? Thinking it probably has similar issues with other scripts....

thanks

Johan

TOPICS
Import and export

Views

586

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
May 12, 2021 May 12, 2021

Copy link to clipboard

Copied

i might be missing something... all of these files look the exact same and if i copy/paste it also works fine. I don't have Leelawadee font but from what i can see it looks fine...

 

Can you record your screen so we can see this more clearly please ?

Thanks

Ian

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 17, 2021 May 17, 2021

Copy link to clipboard

Copied

added video recording here

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
May 12, 2021 May 12, 2021

Copy link to clipboard

Copied

I'm not seeing any obvious difference in the two PDFs either...

 

What are you using to view them?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 17, 2021 May 17, 2021

Copy link to clipboard

Copied

Thanks for your comment. Yes, visually they are the same. But if you try and copy/paste the Thai text from them e.g. in Adobe reader you will get different results. This leads to other problems further down the line (e.g. when indexing the content or using proofreading tools).

 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 04, 2021 Aug 04, 2021

Copy link to clipboard

Copied

Hi Johan

Just wondering if you got any resolution in this? We see the same problem though we are doing something slightly different. We have to format in InDesign from a supplied Thai word file and then double check teh PDF we export against the original Word file. 

In older versions of InDesign this worked fine but in 2019/2021 the order of the chararacters/diacritis in the PDF file are pretty much screwed up which I thikn is the same as what you are seeing. Visually it looks OK but when you cope the text (or for us we run an electronic comparison) the order is not correct.

Regards

Michael

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 04, 2021 Aug 04, 2021

Copy link to clipboard

Copied

Hi Michael,
Yeah we also see this with electronic proofreading, but it is also not very
future safe to have such files as output.
Only resolutions proposed by Adobe have been to revert to 2019 or to not
use export pdf in Indesign (and print through distiller instead).
None of those workarounds are feasible to us..
Best regards
Johan

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 04, 2021 Aug 04, 2021

Copy link to clipboard

Copied

Hi Johan

So did you ever get to the bottom of what the problem is? Is the composition engine in the the newer version of InDesign just not abel to handle Thai properly?

Regards

Michael

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 04, 2021 Aug 04, 2021

Copy link to clipboard

Copied

Hi Michael,
Since it works to print the Pdf I think the root cause is the text
generation that happens in the PDF library they use for the export. Since
the same promblem does not happen in Illustrator, I am curious to
understand what the difference is, but I don’t reach that level of support
with Adobe, not sure if anyone else does.
Our comparison software can be configured to work around some of this, but
not completely, and it is not really a long term solution.
Best regards
Johan

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 04, 2021 Aug 04, 2021

Copy link to clipboard

Copied

Hi Johan

Is this Johan from Informa?

Regards

Michael

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 04, 2021 Aug 04, 2021

Copy link to clipboard

Copied

That’s me.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 04, 2021 Aug 04, 2021

Copy link to clipboard

Copied

Ahha - Michael he in Perigord

I'll email you directly as this has some relevance to one of your customers

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 04, 2021 Aug 04, 2021

Copy link to clipboard

Copied

Ok I kind of suspected I would know you 😉
Please also upvote this (and maybe leave a comment) if you agree adobe
should put prio on it
https://indesign.uservoice.com/forums/601180-adobe-indesign-bugs/suggestions/43907544-thai

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Aug 10, 2021 Aug 10, 2021

Copy link to clipboard

Copied

Hi Johan,

I see the issue with the exported PDF as well.

Bug confirmed.

 

What I did:

Opened your "thai test export.pdf" in Acrobat Pro on my Windows 10 machine.

Copied the text to the clipboard, turned to your InDesign document where I activated the font Adobe Thai Regular from Adobe fonts and pasted the text in a text frame that was formatted with Adobe Thai Regular.

 

Did the same with "thai test print.pdf" in Acrobat Pro on my Windows 10 machine.

 

No issues with "thai test print.pdf", three different results, all wrong, with "thai test export.pdf":

 

Copy-Paste-Tests-AdobeThai-from-Acrobat.PNG

 

Voted and commented at InDesign UserVoice.

 

Regards,
Uwe Laubender

( ACP )

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Aug 10, 2021 Aug 10, 2021

Copy link to clipboard

Copied

Also tried the following workflow with InDesign 2021 on Windows 10:

 

Print to PostScript > distill to PDF

Copy text from PDF

 

Straight away that did not work.

One error A couple of errors in the result sneaked in. See screenshot below.

 

What worked without flaw:

Switched to the Edit PDF workspace in Acrobat Pro DC and copied from that.

 

PrintToPostScriptDistillToPDF-CopyFromAcrobatProDC-Workflow.PNG

 

Regards,
Uwe Laubender

( ACP )

 

EDITED: Added a different screenshot with the error marked.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 10, 2021 Aug 10, 2021

Copy link to clipboard

Copied

Interesting.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Aug 10, 2021 Aug 10, 2021

Copy link to clipboard

Copied

I can also confirm that the bug is not with InDesign CC 2019 version 14.0.3.433 on Windows 10.

However I can detect it also with InDesign 2021 2020 version 15.1.3.302.

 

Regards,
Uwe Laubender

( ACP )

 

EDITED Typo: I corrected "2021" to "2020" now.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 10, 2021 Aug 10, 2021

Copy link to clipboard

Copied

Thanks Uwe, yes this is my experience too. 

Something broke with CC 2020. A bit weird that it only seems to affect Thai script.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Aug 10, 2021 Aug 10, 2021

Copy link to clipboard

Copied

Hi Johan,

I'm not sure if that error is only with Thai script.

See this discussion:

 

Additional Character spaces added when viewed as text
Debbie @ Bella, Aug 08, 2021

https://community.adobe.com/t5/indesign/additional-character-spaces-added-when-viewed-as-text/td-p/1...

 

The application where the text is copied from is not Adobe Reader or Acrobat Pro DC, but a " digital host (Zinio)." ( whatever that is. ) Nevertheless both issues could be related.

 

Regards,
Uwe Laubender

( ACP )

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 10, 2021 Aug 10, 2021

Copy link to clipboard

Copied

OK this I havent seen. For many years we have had code to remove such spaces in our proofreading tool (I work for a supplier of automated proofreading software) but I have not heard of any problems recently. There have also over the years been issues with "All Caps" but this I think they have solved.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 02, 2022 Jun 02, 2022

Copy link to clipboard

Copied

We (the DTP department in a translation agency) also got issues with Thai encoding in PDFs exported from InDesign 2021 (v16).
Just like thread starter Johan we can't use "Print to Adobe PDF" as a workaround because we are working on interactive documents which contain form fields and buttons.
Laubender mentioned that the bug would not exist in INDD2019, but our export to PDF from INDD2019 (v14.0.3) still shows issues, even if not the complete mess we get from the higher versions.

 

So, here we go, in this post everything refers to INDD2021:

Text encoding is fine in the INDD, compare screenshot 1.
I can properly copy text to my clipboard and place it in a TXT editor. If in the first line one of three Unicode characters gets displayed separately from the rest of the abugida, this seems ok and only got to to with how the editor decides about text placement in this line. In the last line the same three characters ('THAI CHARACTER SO SUA' + vowel mark + tone mark; U+0E2A + U+0E37 + U+0E48) combine well to properly form the Thai abugida in question.
Similar case further to the right, and the rest of the text is completely unconspicuous as far as I can tell.

screenshot 1)

 

Thai-TXT_from_INDD2021.PNG

 

 

 

 

 

 

 

 

 

 

 

But in the PDF we exported from that very INDD that is no more the case. On the surface it looks fine, but again copying over to a clean text format reveals quite a mess, see screenshot 2.

We now got multiplying tone and vowel marks all over the place (compare the purple boxes), plus empty squares and black diamond question marks which indicate invalid or replacement characters and also some extra empty spaces thrown in between. No good at all.
For a more detailed look, here I chose the first complex (and at once showing a broken encoding) abugida to place on the last line (in the red boxes). First it properly displays the three parts it is supposed to contain: 'THAI CHARACTER RO RUA' + vowel mark + tone mark (U+0E23 + U+0E37 + U+0E48), but then it is followed by unwanted stuff like the characters U+DBC0 and U+DEE0), then adding the same tone mark another time (U+0E48).
Same story with the character we already copied in the first screenshot (orange boxes). After the three correct Unicode characters which already form the abugida (U+0E2A + U+0E37 + U+0E48), we also receive the same unwanted trio (U+DBC0 + U+DEE0 + U+0E48) as above.

screenshot 2)

Thai-TXT_from_PDF2021.PNG

 

 

 

 

 

 

 

 

 

 

 

 

Our linguists need to be able to properly work with the text in the PDFs, that is to copy the text in its proper encoding over to comment boxes and other tools. We also want to deliver properly working PDFs to our client. Obviously with these PDFs we can't do that.

I will vote in the uservoice post provided by Johan and soon add some more details regarding the issues we see in Thai-PDFs exported from INDD2019 .

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 02, 2022 Jun 02, 2022

Copy link to clipboard

Copied

For better result in 2019 I would try to use a different font. I have tried various fonts in 2019 but not all of them will work with Thai diacritics (in InDesign, in Illustrator it works fine even in latest versions).

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 02, 2022 Jun 02, 2022

Copy link to clipboard

Copied

What would be great would be some guidance from Adobe how to work around this. I have been in a few calls but they are not communicating.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Jun 02, 2022 Jun 02, 2022

Copy link to clipboard

Copied

"Laubender mentioned that the bug would not exist in INDD2019, but our export to PDF from INDD2019 (v14.0.3) still shows issues, even if not the complete mess we get from the higher versions."

 

Hi Linienstraße,

could be that some glyphs that I did not test are not working with my InDesign version 14.0.3 on Windows 10.

Test Johan's sample document if you see any errors with 14.0.3. What could be also important: I only did a test with PDF Export (Print) and not with PDF Export (Interactive).

 

You could be OK with the following strategy:

[1] Export to PDF Export (Print) to PDF document "A".

[2] Export to PDF Export (Interactive) to PDF document "B".

Open PDF "B" in Acrobat Pro and exchange all pages with the ones from "A" from step [1].

The interactive elements will be maintained.

 

Regards,
Uwe Laubender
( Adobe Community Professional )

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 03, 2022 Jun 03, 2022

Copy link to clipboard

Copied

LATEST

Hello Johan and Laubender, thank you for the quick replies.

About using different fonts: for Thai, we are working with „Sarabun“, „Neue Frutiger Thai“ and „Noto Sans Thai“.
In the INDD unsurprisingly the text encoding is always clean, only after exporting to PDF the problems develop and then indeed differently for different fonts. But so far no font seems to work without fail. If you could recommend a looped style Thai font, which produces reliable results (at least in INDD2019), it would be very welcome.

 

But now more details from our file:
1) First I converted it to INDD2019, font is still „Sarabun“, export method is still „interactive PDF“.

(I also exported this as a „print PDF“ but the results in both PDF and TXT file were identical to the „interactive PDF“ export, showing the same issues at the same position, so that setting doesn’t seem to be a factor.)

 

Here we still got 3 issues in 2 categories

case 1: line 2 (orange box)
PDF export messes with the sequence of the Unicode characters (THAI CHARACTER SARA UEE now appears over the wrong abugida) and adds a space:

INDD: \u0e2a\u0e37\u0e48\u0e2d\u0e2a\u0e32

PDF: \u0e2a\u0e48\u0e2d\u0e37 \u0e2a\u0e32

also case 1: line 4

INDD: \u0e40\u0e1e\u0e37\u0e48\u0e2d\u0e1b

PDF: \u0e40\u0e1e\u0e48\u0e2d\u0e37 \u0e1b

 

case 2: line 4 (red box)
PDF export adds invalid characters U+DBC0 + U+DEF8

INDD: \u0e23\u0e17\u0e33\u0e07

PDF: \u0e23\u0e17\udbc0\udef8\u0e33\u0e07

 

Thai-TXT_from_PDF2019-Sarabun.PNG

 

 


2) After switching the font to „Neue Frutiger Thai“ and converting to INDD2019 we are down to only one issue, exactly the same case 2 (line 4, red box) from above.

Thai-TXT_from_PDF2019-NFr_Thai.PNG

 

 

 

3) But with „Noto Sans Thai“ (INDD2019) again quite a few problems are cropping up in the PDF:

case 1b: line 2 (purple box)

Variant of case 1, now two characters move to the wrong position, then a space gets added

INDD: \u0e40\u0e0a\u0e35\u0e48\u0e22\u0e27

PDF: \u0e40\u0e0a\u0e22\u0e35\u0e48 \u0e27

also case1b: line 4

INDD: \u0e07\u0e1c\u0e39\u0e49\u0e2d\u0e37\u0e48\u0e19

PDF: \u0e07\u0e1c\u0e2d\u0e39\u0e49 \u0e37\u0e48\u0e19

 

case 3, line 2 (blue box)

PDF export messes with the sequence of Unicode characters, two diacritics move to the wrong abugida, but no space is added.

INDD: \u0e30\u0e1c\u0e1d\u0e39\u0e36\u0e49\u0e01

PDF: \u0e30\u0e1c\u0e39\u0e49\u0e1d\u0e36\u0e01

 

case 2 (line 4, red box) seems to be wrong all the time, no matter which font is being applied

Thai-TXT_from_PDF2019-NotoS_Thai.PNG

 

Also many thanks to Laubender for the idea to assemble the final PDF from two exports, that might indeed work, event though not out of the box.
Here we need to replace step [1] with a „Print to“ -> Adobe PDF, because „Export as“-> Print PDF still gives us the same encoding issues.

... and then the placement of elements is a little off compared to the „Export as“->Interactive PDF. But this looks fixable.
And it is probably a matter of luck that the button texts (from the „Interactive PDF“) stay in proper encoding, they are usually short which should help, at least in my current test project they did.

 

Still, Adobe should make some effort to fix these issues. When exporting to PDF preservation of the proper encoding should not depend on workarounds.

 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines