Skip to main content
bebarth
Community Expert
Community Expert
April 16, 2020
Answered

Accented characters exported in a .csv file

  • April 16, 2020
  • 2 replies
  • 8927 views

I wrote a script to merge the data of different form in a cvs file attached to a document.
Everything works fine, execpt for the accented characters which don't appear correctly in the .csv file.

For exporting the data, I use the util.streamFromString with utf-8 setting.
I tryed all other setting but no one is correct.
Is there a way to export correctly the accented characters?

FYI, merging data with the Acrobat tool works fine.

Thanks for your answer.

 

 

This topic has been closed for replies.
Correct answer Thom Parker

UTF-8 refers to the ANSI character set. So it won't properly translate Unicode Characters. Basically, you can't put Unicode (16 bit) into a plain text ( 8 bit) document. 

Use "utf-16".   

 

However, why are you using a stream? The "createDataObject()" function takes a string as input. You'll save yourself some trouble if you use this function, since JavaScript is native Unicode, so all strings are Unicode. 

2 replies

JR Boulay
Community Expert
Community Expert
April 21, 2020

I have the same issue than bebarth.

When created from a variable, the attachment encoding is an issue.

According to my tests it depends on the computer used:

Acrobat Mac = Western MacOS Roman

Acrobat Windows = Western Windows Latin 1

 

So far so good since until now in the process used by my documents the users does not change computers to open the attachment they just created, but it is not correct.

 

But we cannot use Thom's tip (updating a previously created attachment) when both the PDF and its attachment are created on the fly.

If you want a true sample of this issue install my (free) FormReport utility and use it on a Mac and on a Windows computer : the attachment encoding is not the same … but the script is the same.

(I can share the not minified and full commented JavaScript of FormReport if needed)

Acrobate du PDF, InDesigner et Photoshopographe
Thom Parker
Community Expert
Thom ParkerCommunity ExpertCorrect answer
Community Expert
April 16, 2020

UTF-8 refers to the ANSI character set. So it won't properly translate Unicode Characters. Basically, you can't put Unicode (16 bit) into a plain text ( 8 bit) document. 

Use "utf-16".   

 

However, why are you using a stream? The "createDataObject()" function takes a string as input. You'll save yourself some trouble if you use this function, since JavaScript is native Unicode, so all strings are Unicode. 

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often
bebarth
Community Expert
bebarthCommunity ExpertAuthor
Community Expert
April 17, 2020

Thank you for your answer Tom!

"Use "utf-16"" -> That doesn't work neither.

"However, why are you using a stream? The "createDataObject()" function takes a string as input." -> Because I haven't thought about that! But the result is the same...

However, I found a solution. I attach a txt file already utf-16 formatted then I fill that file. That works fine...
@+

 

Legend
April 20, 2020

Hi,

I come back on this post because I have a trouble.

When the characters are written in quotes such as your example, that woks fine.

When the characters are placed into a variable, that works fine too:

var myVariable="Some ascii text then, ©™Σ";

createDataObject("Tst2.Txt", myVariable,"text/html; charset=utf-16");

In the script I'm writting, the variable is built all along the script and recalled at the end to fill the .txt file.

In the screenshot attached, you can see the variable (lesDonnees) is correctly displayed when recalled in the console, but the special characters are not correctly displayed in the .txt file while the cMIMEType parameter seems to be correctly set!

Do you have any idea on what's happening?

Thanks


Examine the actual contents of the TXT file to see what encoding is used. I mean look at the hex codes, not open in a text editor. Know what WinAnsi, UTF-8 and UTF-16BE will look like. This is a lot of learning but really vital in solving problems like this. Otherwise you are forever trying to deduce what the problem is from side effects and you don't know whether it is the writing software or reading software doing something unwanted.