I wrote a script to merge the data of different form in a cvs file attached to a document.
Everything works fine, execpt for the accented characters which don't appear correctly in the .csv file.
For exporting the data, I use the util.streamFromString with utf-8 setting.
I tryed all other setting but no one is correct.
Is there a way to export correctly the accented characters?
FYI, merging data with the Acrobat tool works fine.
Thanks for your answer.
UTF-8 refers to the ANSI character set. So it won't properly translate Unicode Characters. Basically, you can't put Unicode (16 bit) into a plain text ( 8 bit) document.
Thank you for your answer Tom!
"Use "utf-16"" -> That doesn't work neither.
"However, why are you using a stream? The "createDataObject()" function takes a string as input." -> Because I haven't thought about that! But the result is the same...
Actually it does work, but you have to specify the correct Mime Type, since UTF-8 is the default.
createDataObject("Tst2.Txt", "Some ascii text then, ©™Σ","text/html; charset=utf-16")
By pre-attaching a file that is already UTF-16, you are pre-setting the mime type.
You're first solution would have worked if the file was created with a UTF-16 mimetype.
It's all about being consistent with the typeing all the way through the process.
Great! That works very well...
The "cMIMEType" parameter is not very well documented in the api reference!
Thanks again Tom.
I come back on this post because I have a trouble.
When the characters are written in quotes such as your example, that woks fine.
When the characters are placed into a variable, that works fine too:
var myVariable="Some ascii text then, ©™Σ";
createDataObject("Tst2.Txt", myVariable,"text/html; charset=utf-16");
In the script I'm writting, the variable is built all along the script and recalled at the end to fill the .txt file.
In the screenshot attached, you can see the variable (lesDonnees) is correctly displayed when recalled in the console, but the special characters are not correctly displayed in the .txt file while the cMIMEType parameter seems to be correctly set!
Do you have any idea on what's happening?
Examine the actual contents of the TXT file to see what encoding is used. I mean look at the hex codes, not open in a text editor. Know what WinAnsi, UTF-8 and UTF-16BE will look like. This is a lot of learning but really vital in solving problems like this. Otherwise you are forever trying to deduce what the problem is from side effects and you don't know whether it is the writing software or reading software doing something unwanted.
Thank you for your answer.
Opening the txt file with Excel, the Windows (ANSI) setting suits to display the characters correctly.
In a firs step, I know the setting to use, but I don't understand what's happening with the text stored in a variable.
Is there a way to set the txt created attached file correctly.
" I mean look at the hex codes, not open in a text editor."
...Sorry, but I don't understand how can I look at the hex codes!
After a lot of testing, here is what I found:
Example #1 with the Tom's script -> works fine.
Example #2 where I add some accentuated characters to the special characters of the first script -> createDataObject("Example2.Txt", "Some ascii text then, ©™Σ plus éèàôùïîÉÈÁÀ","text/html; charset=utf-16") -> works fine too.
Example #3 where I only indicated the accentuated characters -> createDataObject("Example2.Txt", "Some ascii text then, éèàôùïîÉÈÁÀ","text/html; charset=utf-16") -> doesn't work anymore.
So I decided to do an Example #4 where I add these special characters to my variable in end of the script before creating the txt data object and that works fine (why ????).
After several other tests I found I can add only one of these characters (©™Σ) at the beginning or at the end of my variable for working fine.
So, for my script, I found the solution which is to create the txt dataobject with these symbols (or just one) then to fill it with my variable and streamFromString then setDataObjectContents.
Thanks for reading and providing comments on this post.
"UTF-8 refers to the ANSI character set. So it won't properly translate Unicode Characters." I have to completely disagree. UTF-8 is designed to both include readable low ASCII and all Unicode characters too. It is the best and most recommended Unicode format for most purposes, UTF-16 is not nearly so flexible.
There is a module for Notepad++ to show Hex codes https://appuals.com/how-to-install-notepad-hex-editor-plugin/
Really I find guesswork terrifying! This is all extremely well defined once you know how the different encodings are used and represented.
You seem to be a specialist and must certainly be right. For my part, I don't understand a lot about these types of formats and I therefore try to find a solution for my script.
Do you have a good cMIMEType parameter to set directly the creating of the data object?
Do you know how to read the hex codes on Mac?
In Mac I would just use the command line for short files. If you are happy with the command line you can use
od -xa filename
which shows hex on one line, and ASCII characters on the next.
...and the result for a short file is:
What can you deduce?
We can see it is NOT UTF-16 because that would have 00 in most character pairs.
Let's look at the line containing Pr??vu
We can see the hex codes 69 6d a9 c3 75 76. The interesting part, the ? ? is a9 c3. Annoyingly these od options reverse each pair of bytes so what we actually want is c3a9.
Here we come into guesswork, but I'll assume these are recognisable European words.
Do you mean "U+FEFF" is the charset to indicate for the cMIMEType parameter?
No, it needs to get directly into the file as the first three bytes.
Thanks for your help.
I have the same issue than bebarth.
When created from a variable, the attachment encoding is an issue.
According to my tests it depends on the computer used:
Acrobat Mac = Western MacOS Roman
Acrobat Windows = Western Windows Latin 1
So far so good since until now in the process used by my documents the users does not change computers to open the attachment they just created, but it is not correct.
But we cannot use Thom's tip (updating a previously created attachment) when both the PDF and its attachment are created on the fly.
If you want a true sample of this issue install my (free) FormReport utility and use it on a Mac and on a Windows computer : the attachment encoding is not the same … but the script is the same.