how to save a UTF-8 encoded text file ?
Copy link to clipboard
Copied
hi People
I have a little script which reads the source text from a layer and saves it to a .txt file. This is on a Mac and all was good until recently when I tried opening the .txt file on a PC in Notepad and found my Ëš degree symbols all whack.
Resaving the .txt file in TextEdit as Unicode (UTF-8) encoding solved the problem, now opens fine in Notepad.
But ideally I'd like the script to output the .txt as UTF-8 in the first place. It's currently Western (Mac OS Roman). I've tryed adding in myfile.encoding = "UTF8" but the resulting file is still Western (and the special charaters have wigged out again)
any help greatly appreciated../daniel
{
var theComp = app.project.activeItem;
var dataRO = theComp.layer("dataRO").sourceText;
// prompt user to save file
var theFile = new File ("~/Desktop/"+ theComp.name + "_output.txt");
theFile = theFile.saveDlg("Save an ASCII export file.");
if (theFile != null) { // check user didn't cancel dialog
theFile.lineFeed = "windows";
//theFile.encoding = "UTF8";
theFile.open("w","TEXT","????");
theFile.writeln("move details:");
theFile.writeln(dataRO.value.toString());
}
theFile.close();
}
Copy link to clipboard
Copied
Have you tried setting the encoding after you open the file?
Dan
Copy link to clipboard
Copied
hi Dan
Thanks for the suggestion but dang it, no joy. I tried setting encoding straight after opening the file and also just before closing but both had the same effect as above... the .txt file is still a Western (Mac OS Roman) file and the special characters have wigged out...
Copy link to clipboard
Copied
Hi,
I remember working hard two years ago on creating a correct text file on OSX, but did not remember if it was a utf-8 case or anything. As my home computer is not a mac, I have no mean to test it tonight, but anyway, here is the big line of it. :
var theFile
= new File(.........);
theFile
.open("w", "TEXT");
theFile
.encoding = "BINARY"
theFile
.linefeed = "Unix"
theFile
.writeln("éà çËôù")
theFile
.close();
Let me know if it is working.
Copy link to clipboard
Copied
hi
thanks for the suggestion but still no joy.
I was thinking it might have something to do with the Creator type but no joy there either.
at this stage it seems like the only option is stick with the saveas in Textedit which is a little dull.
Copy link to clipboard
Copied
Hi, I was just looking at how a text software knows what is the text encoding of a file is and I found that on wikipedia. http://en.wikipedia.org/wiki/Byte_order_mark
So I created a utf8 file in notepad, and look at the binary. At the start of the file, there is those caracters : 0xEF,0xBB,0xBF or

So you should try to add those characters at the start of the file.
var theFile
= new File(.........);
theFile
.open("w", "TEXT");
theFile
.encoding = "BINARY"
theFile
.linefeed = "Unix"
theFile
.write(""
);//or theFile
.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF)
theFile
.write("Your stuff
éà çËôù"
);
theFile
.close();
Copy link to clipboard
Copied
chauffeurdevan, thanks for the suggestion. I tried a few varients with differing results but no real success.
From my testing it seems the main problem is TextEdit (and quick preview, which is kind of a bummer).
If I save the file using theFile.encoding = "UTF-8" it opens perfectly on the PC. Interestingly the same file opened in TextMate on the MAC works fine too as textMate somehow interprets it as UTF-8. For some reason TextEdit assumes it is Western (Mac OS) and shows garbled characters. Inserting the BOM characters and encoding as Binary made TextEdit think it was UTF-8 but it couldn't actually open the file at all ?
So the mystery remains as to how to write a UTF-8 file that both TextEdit and NotePad display correctly but at least I have a way of writing a file that the NotePad displays properly which was the main aim... so perhaps 75% of a solution !
Copy link to clipboard
Copied
Hi,
Got it, it seems, the utf-8 standard use 2-bytes (and more) encoding on accents and special characters.
I found some info there with some code http://ivoronline.com/Coding/Theory/Tutorials/Encoding%20-%20Text%20-%20UTF%208.php
However there was some error so I fixed it. (However for 3 and 4 bytes characters i didnt test it. So maybe you'll have to change back the 0xbf to 0x3f or something else.)
So here is the code.
Header 1 |
---|
function convertCharToUTF(character){ var utfBytes = ""; c = character.charCodeAt(0) if (c < 0x80) { utfBytes = String.fromCharCode (c); } else if (c < 0x800) { utfBytes = String.fromCharCode (0xC0 | c>>6); utfBytes += String.fromCharCode (0x80 | c & 0xbF); } else if (c < 0x10000) { utfBytes = String.fromCharCode (0xE0 | c>>12); utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF); utfBytes += String.fromCharCode (0x80 | c & 0xbF); } else if (c < 0x200000) { utfBytes += String.fromCharCode (0xF0 | c>>18); utfBytes += String.fromCharCode (0x80 | c>>12 & 0xbF); utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF); utfBytes =+ String.fromCharCode (0x80 | c & 0xbF); } return utfBytes } function convertStringToUTF(stringToConvert){ var utfString = "" for (var i = 0 ; i < stringToConvert.length; i++){ utfString = utfString + convertCharToUTF(stringToConvert.charAt (i)) } return utfString; } var theFile= new File("~/Desktop/_output.txt"); theFile.open("w", "TEXT"); theFile.encoding = "BINARY" theFile.linefeed = "Unix" theFile.write("");//or theFile.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF) theFile.write(convertStringToUTF("Your stuff éà çËôù")); theFile.close(); |

