Skip to main content
Known Participant
October 11, 2007
Question

how to get the string's byte length?

  • October 11, 2007
  • 7 replies
  • 3912 views
I have some string,I want to get the string's byte length,how can do it?

for example:

<cfoutput>#len('hihi,这是测试')#</cfoutput>

output is 9

I want to get the byte length is 14, how can i get it?
Thanks.
This topic has been closed for replies.

7 replies

FlashcqxgAuthor
Known Participant
October 13, 2007
Thank you,Adam Cameron

The result is 13,but not 14,why?
Inspiring
October 12, 2007
> Your code can not run on my CFMX7.

Sure. Change it so that it does.

;-)

for (i=1; i <= len(t); i++){

=>

for (i=1; i le len(t); i=i+1){

--
Adam
FlashcqxgAuthor
Known Participant
October 12, 2007
Thank you very much,PaulH and Adam Cameron .
I have test this string in sybase sql anywhere with this sql:
select datalength('hihi,这是测试')
the sql result is: 14
This is i wanted.

To PaulH :
i test your code with my test string,the result is:
hihi�����Dz���
cf len:=13
byte len:=13
java string length:=13
java string objectlength:=13


To Adam Cameron :
Your code result is:
[1] [104]
[2] [105]
[3] [104]
[4] [105]
[5][�][65533]
[6][�][65533]
[7][�][65533]
[8][�][65533]
[9][�][65533]
[10][Dz][498]
[11][�][65533]
[12][�][65533]
[13][�][65533]

both can not get the 14 !!!![/]
Inspiring
October 12, 2007
> you copied & pasted from this forum so of course the encoding is utf-8. if Flashcqxg thinks those chars are 2 bytes each, what would lead you to think the original encoding was utf-8?

Oh for goodness sake. YES I KNOW.

I was illustrating the point that simply counting the number of characters
in a string is *not* a way of determining how many bytes it occupies. The
OP's string, Adobe's facsimile of that string, my facsimile of /Adobe's
facsimile/ of that string; *a string*.

There *must* be some way of calling a function thus:

int specialGoodFunction(String s);

Which returns the number of bytes the string (in whatever encoding it is)
occupies. Which is what the OP is actually interested in. Not you finding
straw men to assault for some silly reason.

--
Adam
Inspiring
October 12, 2007
you copied & pasted from this forum so of course the encoding is utf-8. if Flashcqxg thinks those chars are 2 bytes each, what would lead you to think the original encoding was utf-8?
October 11, 2007
Whilst checking out characters' byte lengths, I found this site: http://www.fileformat.info/info/unicode/char/search.htm, which is good for looking that sort of thing up.

Just FYI.

--
Adam
Inspiring
October 11, 2007
Flashcqxg wrote:
> I want to get the byte length is 14, how can i get it?

why do you think that string's length is 14?

<cfprocessingdirective pageencoding="utf-8">
<cfscript>
t="hihi,这是测试";
jText=createObject("java","java.lang.String").init(t);
b=t.getBytes();
writeoutput("#t#
<br>cf len:=#len(t)#
<br>byte len:=#arrayLen(b)#
<br>java string length:=#t.length()#
<br>java string objectlength:=#jText.length()#");
</cfscript>

all methods return "9".
October 11, 2007
But the Chinese characters are double-byte ones, so the BYTE length SHOULDN'T be 9, it should be 14: four single-byte characters and five double-byte ones (the comma is a double-byte comma too).

The STRING length might be 9, sure.

The size of the chars can be seen by doing this (code attached).

I have to dash to work, but will look at this some more later on.

--
Adam