Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

how to save a UTF-8 encoded text file ?

Contributor ,
Jun 18, 2012 Jun 18, 2012

hi People

I have a little script which reads the source text from a layer and saves it to a .txt file. This is on a Mac and all was good until recently when I tried opening the .txt file on a PC in Notepad and found my Ëš degree symbols all whack.

Resaving the .txt file in TextEdit as Unicode (UTF-8) encoding solved the problem, now opens fine in Notepad.

But ideally I'd like the script to output the .txt as UTF-8 in the first place. It's currently Western (Mac OS Roman). I've tryed adding in myfile.encoding = "UTF8" but the resulting file is still Western (and the special charaters have wigged out again)

any help greatly appreciated../daniel

{

    var theComp = app.project.activeItem;

    var dataRO = theComp.layer("dataRO").sourceText;

   

    // prompt user to save file

    var theFile = new File ("~/Desktop/"+ theComp.name + "_output.txt");

    theFile = theFile.saveDlg("Save an ASCII export file.");

    if (theFile != null) {          // check user didn't cancel dialog

        theFile.lineFeed = "windows";

        //theFile.encoding = "UTF8";

        theFile.open("w","TEXT","????");

        theFile.writeln("move details:");

        theFile.writeln(dataRO.value.toString());

        }

    theFile.close();

}

TOPICS
Scripting
5.2K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 18, 2012 Jun 18, 2012

Have you tried setting the encoding after you open the file?

Dan

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jun 18, 2012 Jun 18, 2012

hi Dan

Thanks for the suggestion but dang it, no joy. I tried setting encoding straight after opening the file and also just before closing but both had the same effect as above... the .txt file is still a Western (Mac OS Roman) file and the special characters have wigged out...

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jun 19, 2012 Jun 19, 2012

Hi,

I remember working hard two years ago on creating a correct text file on OSX, but did not remember if it was a utf-8 case or anything. As my home computer is not a mac, I have no mean to test it tonight, but anyway, here is the big line of it. :

var theFile= new File(.........);

theFile.open("w", "TEXT");

theFile.encoding = "BINARY"

theFile.linefeed = "Unix"

theFile.writeln("éàçËôù")

theFile.close();

Let me know if it is working.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jun 19, 2012 Jun 19, 2012

hi

thanks for the suggestion but still no joy.

I was thinking it might have something to do with the Creator type but no joy there either.

at this stage it seems like the only option is stick with the saveas in Textedit which is a little dull.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jun 20, 2012 Jun 20, 2012

Hi, I was just looking at how a  text software knows what is the text encoding of a file is and I found that on wikipedia. http://en.wikipedia.org/wiki/Byte_order_mark

So I created a utf8 file in notepad, and look at the binary. At the start of the file, there is those caracters : 0xEF,0xBB,0xBF or 

So you should try to add those characters at the start of the  file.

var theFile= new File(.........);

theFile.open("w", "TEXT");

theFile.encoding = "BINARY"

theFile.linefeed = "Unix"

theFile.write("");//or theFile.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF)

theFile.write("Your stuff éàçËôù");

theFile.close();

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jun 21, 2012 Jun 21, 2012

chauffeurdevan, thanks for the suggestion. I tried a few varients with differing results but no real success.

From my testing it seems the main problem is TextEdit (and quick preview, which is kind of a bummer).

If I save the file using theFile.encoding = "UTF-8" it opens perfectly on the PC. Interestingly the same file opened in TextMate on the MAC works fine too as textMate somehow interprets it as UTF-8. For some reason TextEdit assumes it is Western (Mac OS) and shows garbled characters. Inserting the BOM characters and encoding as Binary made TextEdit think it was UTF-8 but it couldn't actually open the file at all ?

So the mystery remains as to how to write a UTF-8 file that both TextEdit and NotePad display correctly but at least I have a way of writing a file that the NotePad displays properly which was the main aim... so perhaps 75% of a solution !

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jun 21, 2012 Jun 21, 2012
LATEST

Hi,

Got it, it seems, the utf-8 standard use 2-bytes (and more) encoding on accents and special characters.

I found some info there with some code http://ivoronline.com/Coding/Theory/Tutorials/Encoding%20-%20Text%20-%20UTF%208.php

However there was some error so I fixed it. (However for 3 and 4 bytes characters i didnt test it. So maybe you'll have to change back the 0xbf to 0x3f or something else.)

So here is the code.

Header 1

function convertCharToUTF(character){

    var utfBytes = "";

    c = character.charCodeAt(0)

    if (c < 0x80) {

        utfBytes =  String.fromCharCode (c);

    }

    else if (c < 0x800) {

        utfBytes =  String.fromCharCode (0xC0 | c>>6);

        utfBytes +=  String.fromCharCode (0x80 | c & 0xbF);

    }

    else if (c < 0x10000) {

        utfBytes = String.fromCharCode (0xE0 | c>>12);

        utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);

        utfBytes += String.fromCharCode (0x80 | c & 0xbF);

    }

    else if (c < 0x200000) {

        utfBytes += String.fromCharCode (0xF0 | c>>18);

        utfBytes += String.fromCharCode (0x80 | c>>12 & 0xbF);

        utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);

        utfBytes =+ String.fromCharCode (0x80 | c & 0xbF);

    }

        return utfBytes

}

function convertStringToUTF(stringToConvert){

    var utfString = ""

    for (var i = 0 ; i < stringToConvert.length; i++){

        utfString = utfString + convertCharToUTF(stringToConvert.charAt (i))

    }

    return utfString;

}

var theFile= new File("~/Desktop/_output.txt");

theFile.open("w", "TEXT");

theFile.encoding = "BINARY"

theFile.linefeed = "Unix"

theFile.write("");//or theFile.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF)

theFile.write(convertStringToUTF("Your stuff éàçËôù"));

theFile.close();

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines