Skip to main content
Participating Frequently
January 17, 2010
Question

UTF-16 representation in a ByteArray REgards

  • January 17, 2010
  • 2 replies
  • 7460 views

Hello all,

Is there a way to write a UTF-16 string into a ByteArray in Flash/AS3? Basically I have a string (var test:String="allan"; for example) and I would like to write that into a ByteArray with UTF-16LE encoding. In this case it would be "61 00 6C 00 6C 00 61 00 6E 00".

I've tried using utf16le.writeMultiByte( clipText, "utf-16" ); but it just comes out with what appears to be UTF8 (or just straight ASCII given the test string).

The use case is to save a UTF-16LE file using FileReference.save(), which I understand I can do by passing it a ByteArray with the correct character encoding in it. Passing just a string saves as UTF-8. Hence the need to convert and store into a UTF-16LE representation in a ByteArray.

Regards,
Allan

This topic has been closed for replies.

2 replies

January 18, 2012

First of all thanks for the solution. What I am trying to do is export datagrid data into excel using as3 excel. As you pointed out the problem with AS3 how it writes bytes for UTF 16 , I applied this patch to as3 excel where it is writing bytes but the issue is as3 excel is also reading these bytes as it maitains a byte array stream for all datagrid rows and than goes at once to write an xls file. I think reading logic also needs to be fixed but not sure how?  Any help will be much appreciated.

Inspiring
January 17, 2010

Have you tried:

utf16le.writeMultiByte( clipText, "unicode" );

From the list here (http://help.adobe.com/en_US/AS3LCR/Flash_10.0/charset-codes.html) it shows that is the Label and the utf-16 is an alias that points back to it. Stranger things have happened.

Also I'm not too much up on all of this, but does utf-16LE mean "little endian"? It seems that the default is big endian. So that might make a difference:

I tried this little test with some Hindi unicode text:

var m:ByteArray=new ByteArray();

//m.endian=Endian.LITTLE_ENDIAN;

m.writeMultiByte("हिन्दी","unicode");

m.position=0

for(var i=0;i<6;i++){

     trace(m.readShort());

}

When I comment out the line my trace is:

14601

16137

10249

19721

9737

16393

When I use that line my trace is:

2361

2367

2344

2381

2342

2368

Which are the correct codes for those characters. Of course if I use
trace(m.readMultiByte(2,"unicode")
It traces out the proper sequence regardless of whether I have set the endianness of the array:
ि
(The fourth character is a magic character for joining characters together.)

allanjardAuthor
Participating Frequently
January 17, 2010

Hi Rothrock,

Thanks very much for your reply! I did indeed try the 'unicode' option, but unfortunatly to no avail. Likewise I've just tried the 'unicodeFFFE' option, just to see what the difference might be - I didn't get any. Basically I'm looking for the string "allan" to be saved in a file with the following hex pattern:

61 00 6C 00 6C 00 61 00 6E 00 

My text editor tells be that is UTF-16 Little Endian (you were quite right with the acronym!). So this is what I have been trying:

var utf16:ByteArray = new ByteArray();

utf16.endian=Endian.LITTLE_ENDIAN;

utf16.writeMultiByte( "allan", "unicode" );

var fileRef:FileReference = new FileReference();

fileRef.save( utf16, fileName );

With various combinations of the endian type, and the second parameter for writeMultiByte. I just can't seem to get it - it's always outputting: 61 6C 6C 61 6E. I could of course add the zero padding in, but can imagine that would break characters with a value >255.

regards,

Allan

Inspiring
January 17, 2010

I'm totally out of my depth here, but that is how I learn stuff myself. So I'll flail around a bit more.

Using your example if I do this:

utf16.writeMultiByte( "allan", "utf-8" );

trace(utf16.length)

I get 5 which would be expected. But if I do:

utf16.writeMultiByte("allan","unicode");

I get 1, which I did not expect.

I know that flash really loves utf-8, so I wonder if some how all the strings are being converted to utf-8?

I went to the wikipedia and found the following string "水z  " (water, z, G clef). When I tried to write it using mutlibyte it broke at the "z"

It should be 34 6C, 7A 00, 34 D8, 1E DD, but it just breaks at 7A.

utf16.writeMultiByte("水z  ", "unicode");

trace(utf16.length) // returns 3

utf16.writeMultiByte( "水  z", "unicode" );

trace(utf16.length) // returns 7

Also with your example I'm only getting a length of 1. You are actually getting the 61 6C 6C 61 6E back out?

So to my mind it looks like there is some bug in mixing ascii-encodeable characters with those that need hight points. But I don't understand the standard enough to find out. Might be worth trying to open a bug for this....