Skip to main content
Participating Frequently
January 17, 2010
Question

UTF-16 representation in a ByteArray REgards

  • January 17, 2010
  • 2 replies
  • 7460 views

Hello all,

Is there a way to write a UTF-16 string into a ByteArray in Flash/AS3? Basically I have a string (var test:String="allan"; for example) and I would like to write that into a ByteArray with UTF-16LE encoding. In this case it would be "61 00 6C 00 6C 00 61 00 6E 00".

I've tried using utf16le.writeMultiByte( clipText, "utf-16" ); but it just comes out with what appears to be UTF8 (or just straight ASCII given the test string).

The use case is to save a UTF-16LE file using FileReference.save(), which I understand I can do by passing it a ByteArray with the correct character encoding in it. Passing just a string saves as UTF-8. Hence the need to convert and store into a UTF-16LE representation in a ByteArray.

Regards,
Allan

This topic has been closed for replies.

2 replies

January 18, 2012

First of all thanks for the solution. What I am trying to do is export datagrid data into excel using as3 excel. As you pointed out the problem with AS3 how it writes bytes for UTF 16 , I applied this patch to as3 excel where it is writing bytes but the issue is as3 excel is also reading these bytes as it maitains a byte array stream for all datagrid rows and than goes at once to write an xls file. I think reading logic also needs to be fixed but not sure how?  Any help will be much appreciated.

Inspiring
January 17, 2010

Have you tried:

utf16le.writeMultiByte( clipText, "unicode" );

From the list here (http://help.adobe.com/en_US/AS3LCR/Flash_10.0/charset-codes.html) it shows that is the Label and the utf-16 is an alias that points back to it. Stranger things have happened.

Also I'm not too much up on all of this, but does utf-16LE mean "little endian"? It seems that the default is big endian. So that might make a difference:

I tried this little test with some Hindi unicode text:

var m:ByteArray=new ByteArray();

//m.endian=Endian.LITTLE_ENDIAN;

m.writeMultiByte("हिन्दी","unicode");

m.position=0

for(var i=0;i<6;i++){

     trace(m.readShort());

}

When I comment out the line my trace is:

14601

16137

10249

19721

9737

16393

When I use that line my trace is:

2361

2367

2344

2381

2342

2368

Which are the correct codes for those characters. Of course if I use
trace(m.readMultiByte(2,"unicode")
It traces out the proper sequence regardless of whether I have set the endianness of the array:
ि
(The fourth character is a magic character for joining characters together.)

allanjardAuthor
Participating Frequently
January 17, 2010

Hi Rothrock,

Thanks very much for your reply! I did indeed try the 'unicode' option, but unfortunatly to no avail. Likewise I've just tried the 'unicodeFFFE' option, just to see what the difference might be - I didn't get any. Basically I'm looking for the string "allan" to be saved in a file with the following hex pattern:

61 00 6C 00 6C 00 61 00 6E 00 

My text editor tells be that is UTF-16 Little Endian (you were quite right with the acronym!). So this is what I have been trying:

var utf16:ByteArray = new ByteArray();

utf16.endian=Endian.LITTLE_ENDIAN;

utf16.writeMultiByte( "allan", "unicode" );

var fileRef:FileReference = new FileReference();

fileRef.save( utf16, fileName );

With various combinations of the endian type, and the second parameter for writeMultiByte. I just can't seem to get it - it's always outputting: 61 6C 6C 61 6E. I could of course add the zero padding in, but can imagine that would break characters with a value >255.

regards,

Allan

allanjardAuthor
Participating Frequently
January 23, 2010

Hi Rothrock,

Thanks for the info - and for following up on this. We'll see how the bugs progress through Adobe, hopefully they will be resolved fairly easily!

Regards,

Allan


I think I've worked out what is going on. AS3 is using UTF-16LE internally (which is documented) including surrogates etc. However, if the character code is less that U+FF then only one byte is used for the character rather than two! I can see that this is a good optimisation to make given how common ASCII is. Having said that, my understanding is that this is not "true" UTF-16 - where each character must be represented by two bytes. I'm sure it has a name, this character encoding, but I can't see it on a quick scan of the Unicode documentation.

For anyone interested I've bashed together a function which will put a string into a true UTF-16 byte array:

private function strToUTF16( str:String ):ByteArray

{

     var utf16:ByteArray = new ByteArray();

     var iChar:uint;

     var i:uint=0, iLen:uint = str.length;

     

     /* BOM first */

     utf16.writeByte( 0xFF );

     utf16.writeByte( 0xFE );

     

     while ( i < iLen )

     {

          iChar = str.charCodeAt(i);

          trace( iChar );

          

          if ( iChar < 0xFF )

          {

               /* one byte char */

               utf16.writeByte( iChar );

               utf16.writeByte( 0 );

          }

          else

          {

               /* two byte char */

               utf16.writeByte( iChar & 0x00FF );

               utf16.writeByte( iChar >> 8 );

          }

          

          i++;

     }

     

     return utf16;

}

Phew...

Regards,

Allan