Participating Frequently

Question

UTF-16 representation in a ByteArray REgards

Forum|Forum|16 years ago
January 17, 2010
2 replies
7460 views

Hello all,

Is there a way to write a UTF-16 string into a ByteArray in Flash/AS3? Basically I have a string (var test:String="allan"; for example) and I would like to write that into a ByteArray with UTF-16LE encoding. In this case it would be "61 00 6C 00 6C 00 61 00 6E 00".

I've tried using utf16le.writeMultiByte( clipText, "utf-16" ); but it just comes out with what appears to be UTF8 (or just straight ASCII given the test string).

The use case is to save a UTF-16LE file using FileReference.save(), which I understand I can do by passing it a ByteArray with the correct character encoding in it. Passing just a string saves as UTF-8. Hence the need to convert and store into a UTF-16LE representation in a ByteArray.

Regards,
Allan

ActionScript

This topic has been closed for replies.

A

Anonymous

First of all thanks for the solution. What I am trying to do is export datagrid data into excel using as3 excel. As you pointed out the problem with AS3 how it writes bytes for UTF 16 , I applied this patch to as3 excel where it is writing bytes but the issue is as3 excel is also reading these bytes as it maitains a byte array stream for all datagrid rows and than goes at once to write an xls file. I think reading logic also needs to be fixed but not sure how? Any help will be much appreciated.

R

Rothrock

Inspiring

Have you tried:

utf16le.writeMultiByte( clipText, "unicode" );

From the list here (http://help.adobe.com/en_US/AS3LCR/Flash_10.0/charset-codes.html) it shows that is the Label and the utf-16 is an alias that points back to it. Stranger things have happened.

Also I'm not too much up on all of this, but does utf-16LE mean "little endian"? It seems that the default is big endian. So that might make a difference:

I tried this little test with some Hindi unicode text:

var m:ByteArray=new ByteArray();

//m.endian=Endian.LITTLE_ENDIAN;

m.writeMultiByte("हिन्दी","unicode");

m.position=0

for(var i=0;i<6;i++){

trace(m.readShort());

}

When I comment out the line my trace is:

14601

16137

10249

19721

9737

16393

When I use that line my trace is:

2361

2367

2344

2381

2342

2368

Which are the correct codes for those characters. Of course if I use

trace(m.readMultiByte(2,"unicode")

It traces out the proper sequence regardless of whether I have set the endianness of the array:

ह

ि

न

्

द

ी

(The fourth character is a magic character for joining characters together.)

A

allanjardAuthor

Participating Frequently

Hi Rothrock,

Thanks very much for your reply! I did indeed try the 'unicode' option, but unfortunatly to no avail. Likewise I've just tried the 'unicodeFFFE' option, just to see what the difference might be - I didn't get any. Basically I'm looking for the string "allan" to be saved in a file with the following hex pattern:

61 00 6C 00 6C 00 61 00 6E 00

My text editor tells be that is UTF-16 Little Endian (you were quite right with the acronym!). So this is what I have been trying:

var utf16:ByteArray = new ByteArray();
utf16.endian=Endian.LITTLE_ENDIAN;
utf16.writeMultiByte( "allan", "unicode" );
var fileRef:FileReference = new FileReference();
fileRef.save( utf16, fileName );

With various combinations of the endian type, and the second parameter for writeMultiByte. I just can't seem to get it - it's always outputting: 61 6C 6C 61 6E. I could of course add the zero padding in, but can imagine that would break characters with a value >255.

regards,

Allan

A

allanjardAuthor

Participating Frequently

Hi Rothrock,

Thanks for the info - and for following up on this. We'll see how the bugs progress through Adobe, hopefully they will be resolved fairly easily!

Regards,

Allan

I think I've worked out what is going on. AS3 is using UTF-16LE internally (which is documented) including surrogates etc. However, if the character code is less that U+FF then only one byte is used for the character rather than two! I can see that this is a good optimisation to make given how common ASCII is. Having said that, my understanding is that this is not "true" UTF-16 - where each character must be represented by two bytes. I'm sure it has a name, this character encoding, but I can't see it on a quick scan of the Unicode documentation.

For anyone interested I've bashed together a function which will put a string into a true UTF-16 byte array:

private function strToUTF16( str:String ):ByteArray
{
     var utf16:ByteArray = new ByteArray();
     var iChar:uint;
     var i:uint=0, iLen:uint = str.length;
     
     /* BOM first */
     utf16.writeByte( 0xFF );
     utf16.writeByte( 0xFE );
     
     while ( i < iLen )
     {
          iChar = str.charCodeAt(i);
          trace( iChar );
          
          if ( iChar < 0xFF )
          {
               /* one byte char */
               utf16.writeByte( iChar );
               utf16.writeByte( 0 );
          }
          else
          {
               /* two byte char */
               utf16.writeByte( iChar & 0x00FF );
               utf16.writeByte( iChar >> 8 );
          }
          
          i++;
     }
     
     return utf16;
}

Phew...

Regards,

Allan

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded