Copy link to clipboard
Copied
Hello all,
Is there a way to write a UTF-16 string into a ByteArray in Flash/AS3? Basically I have a string (var test:String="allan"; for example) and I would like to write that into a ByteArray with UTF-16LE encoding. In this case it would be "61 00 6C 00 6C 00 61 00 6E 00".
I've tried using utf16le.writeMultiByte( clipText, "utf-16" ); but it just comes out with what appears to be UTF8 (or just straight ASCII given the test string).
The use case is to save a UTF-16LE file using FileReference.save(), which I understand I can do by passing it a ByteArray with the correct character encoding in it. Passing just a string saves as UTF-8. Hence the need to convert and store into a UTF-16LE representation in a ByteArray.
Regards,
Allan
Copy link to clipboard
Copied
Have you tried:
utf16le.writeMultiByte( clipText, "unicode" );
From the list here (http://help.adobe.com/en_US/AS3LCR/Flash_10.0/charset-codes.html) it shows that is the Label and the utf-16 is an alias that points back to it. Stranger things have happened.
Also I'm not too much up on all of this, but does utf-16LE mean "little endian"? It seems that the default is big endian. So that might make a difference:
I tried this little test with some Hindi unicode text:
var m:ByteArray=new ByteArray();
//m.endian=Endian.LITTLE_ENDIAN;
m.writeMultiByte("हिन्दी","unicode");
m.position=0
for(var i=0;i<6;i++){
trace(m.readShort());
}
When I comment out the line my trace is:
14601
16137
10249
19721
9737
16393
When I use that line my trace is:
2361
2367
2344
2381
2342
2368
Copy link to clipboard
Copied
Hi Rothrock,
Thanks very much for your reply! I did indeed try the 'unicode' option, but unfortunatly to no avail. Likewise I've just tried the 'unicodeFFFE' option, just to see what the difference might be - I didn't get any. Basically I'm looking for the string "allan" to be saved in a file with the following hex pattern:
61 00 6C 00 6C 00 61 00 6E 00
My text editor tells be that is UTF-16 Little Endian (you were quite right with the acronym!). So this is what I have been trying:
var utf16:ByteArray = new ByteArray();
utf16.endian=Endian.LITTLE_ENDIAN;
utf16.writeMultiByte( "allan", "unicode" );
var fileRef:FileReference = new FileReference();
fileRef.save( utf16, fileName );
With various combinations of the endian type, and the second parameter for writeMultiByte. I just can't seem to get it - it's always outputting: 61 6C 6C 61 6E. I could of course add the zero padding in, but can imagine that would break characters with a value >255.
regards,
Allan
Copy link to clipboard
Copied
I'm totally out of my depth here, but that is how I learn stuff myself. So I'll flail around a bit more.
Using your example if I do this:
utf16.writeMultiByte( "allan", "utf-8" );
trace(utf16.length)
I get 5 which would be expected. But if I do:
utf16.writeMultiByte("allan","unicode");
I get 1, which I did not expect.
I know that flash really loves utf-8, so I wonder if some how all the strings are being converted to utf-8?
I went to the wikipedia and found the following string "水z " (water, z, G clef). When I tried to write it using mutlibyte it broke at the "z"
It should be 34 6C, 7A 00, 34 D8, 1E DD, but it just breaks at 7A.
utf16.writeMultiByte("水z ", "unicode");
trace(utf16.length) // returns 3
utf16.writeMultiByte( "水 z", "unicode" );
trace(utf16.length) // returns 7
Also with your example I'm only getting a length of 1. You are actually getting the 61 6C 6C 61 6E back out?
So to my mind it looks like there is some bug in mixing ascii-encodeable characters with those that need hight points. But I don't understand the standard enough to find out. Might be worth trying to open a bug for this....
Copy link to clipboard
Copied
The G clef doesn't show up here in the forums and causes all kinds of trouble in the Actionscript editor, but it did "work" as a UTF-16LE that needs 4 bytes to encode...
Copy link to clipboard
Copied
Hi Rothrock,
Thanks for the replies - I'm completely learning as I go here as well. I know what I want the final output file to look like - but just can't figure out how to get it into that...
According to Wikipdia ( http://en.wikipedia.org/wiki/ActionScript ), AS3 uses UTF-16 natively, and one would need to 'convert' to UTF-8. From the FileReference documentation, this would appear to be done automatically when saving a String, but the data is left as is when writing a byte array. The fact that the Adobe documentation called UTF-16 "unicode" and UTF-8 just "UTF-8", would appear to support that UTF-16 is native.
Regarding your length traces - I tried this:
var utf16:ByteArray = new ByteArray();
utf16.writeMultiByte( "allan", "unicode" );
trace( utf16.length );
*edit* Bug FP-3693 opened.
Copy link to clipboard
Copied
That is bizarre that you get 5. Yes it should be 10 and even odder is that I get only 1. So I'm going to go with that they have some problems there.
I'm using CS4, publishing AS3 to the 10 (and also tried 9) player. I'm on a Mac running 10.6.2. I'm guessing you are on Windows?
I'll try it tomorrow on my work machine.
Copy link to clipboard
Copied
Hi Rothrock,
I'm using a Mac for the run time, but the swf is being compiled in CS4 on Windows Vista.
Here is another odd thing:
On the character encoding page ( http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/charset-codes.html ) Adobe list:
Character set: "Unicode" - Label: "unicode"
Character set: "Unicode (Big endian)" - Label: "unicodeFFFE"
The obvious inference from this is that "unicode" is little endian. However: Looking at the Unicode web-site ( http://unicode.org/faq/utf_bom.html#bom4 ) it suggests that:
FFFE: Little endian
FEFF: Big endian!
Oops... Bug FP-3695 added.
Regards,
Allan
Copy link to clipboard
Copied
Yeah on my windows machine it also returns a length of 5. So there is something very wonky all around. Sorry that we didn't get it working, but at least we figured out some stuff. Good luck.
Copy link to clipboard
Copied
Hi Rothrock,
Thanks for the info - and for following up on this. We'll see how the bugs progress through Adobe, hopefully they will be resolved fairly easily!
Regards,
Allan
Copy link to clipboard
Copied
I think I've worked out what is going on. AS3 is using UTF-16LE internally (which is documented) including surrogates etc. However, if the character code is less that U+FF then only one byte is used for the character rather than two! I can see that this is a good optimisation to make given how common ASCII is. Having said that, my understanding is that this is not "true" UTF-16 - where each character must be represented by two bytes. I'm sure it has a name, this character encoding, but I can't see it on a quick scan of the Unicode documentation.
For anyone interested I've bashed together a function which will put a string into a true UTF-16 byte array:
private function strToUTF16( str:String ):ByteArray
{
var utf16:ByteArray = new ByteArray();
var iChar:uint;
var i:uint=0, iLen:uint = str.length;
/* BOM first */
utf16.writeByte( 0xFF );
utf16.writeByte( 0xFE );
while ( i < iLen )
{
iChar = str.charCodeAt(i);
trace( iChar );
if ( iChar < 0xFF )
{
/* one byte char */
utf16.writeByte( iChar );
utf16.writeByte( 0 );
}
else
{
/* two byte char */
utf16.writeByte( iChar & 0x00FF );
utf16.writeByte( iChar >> 8 );
}
i++;
}
return utf16;
}
Phew...
Regards,
Allan
Copy link to clipboard
Copied
Sweet. Thanks for sharing that.
Copy link to clipboard
Copied
Hi Allan,
Thanks a lot for sharing this. This just save me several days of pounding my head against the wall. 😃
Copy link to clipboard
Copied
First of all thanks for the solution. What I am trying to do is export datagrid data into excel using as3 excel. As you pointed out the problem with AS3 how it writes bytes for UTF 16 , I applied this patch to as3 excel where it is writing bytes but the issue is as3 excel is also reading these bytes as it maitains a byte array stream for all datagrid rows and than goes at once to write an xls file. I think reading logic also needs to be fixed but not sure how? Any help will be much appreciated.
Get ready! An upgraded Adobe Community experience is coming in January.
Learn more