Skip to main content
wladekarek
Participant
August 23, 2018
Question

urldecode codes euro sign as %u2AC instead of %u20AC

  • August 23, 2018
  • 2 replies
  • 623 views

Hello!

This line:

msg = urldecode(msg,"utf-8");

Changes value of msg = %u20AC (euro sign) to %u2AC, what is a problem, because after that I can't uncode it on the javascript side.

unescape('%u20AC') = '€'

unescape('%u2AC') = '%u2AC'

instead of '%u20AC' so every time I expect euro sign to be showed I receive'%u2AC'.

Can I modify urldecode or use something different?

This topic has been closed for replies.

2 replies

BKBK
Community Expert
Community Expert
August 25, 2018

Hi @wladekarek, does it help when you replace "%u20AC" with "%E2%82%AC"?

Test this

<cfscript>

msg="It costs %E2%82%AC 5.";

msg = urldecode(msg, "UTF-8");

writeoutput(msg);

</cfscript>

For details see, for example, the StackOverflow post on how to convert from Unicode to UTF-8 Hex. In your particular case, the steps are as follows:

Unicode value = 20AC

Binary value = 10000010101100

The Unicode value U+20AC is in the range 0x00000800 - 0x0000FFFF range (0x4E3E - 0xFFFF). So its Hex representation will be of the form:

   1110xxxx 10xxxxxx 10xxxxxx

where the x represents digits taken in order from the binary value, progressing from right to left. Starting therefore with the rightmost, and filling the 6 x positions with the digits, we get

10101100 (note: 101100 are the rightmost 6 digits of the binary value, 10000010101100)

Next, the representation in the middle. It is

10000010 (note: 000010 are the next 6 digits of the binary value, 10000010101100, going from right to left)

Lastly, the leftmost representation. It is

1110xx10 (note: 10 are the remaining digits of the binary value, 10000010101100, going from right to left).

This becomes 11100010, as the rules also say that we must replace any remaining x with 0.

The final Hex representation is therefore

11100010   10000010  10101100

Converting each binary back to Hex, we get

%E2%82%AC

Inspiring
August 23, 2018

Can you please post your code as you run it?

I run

<cfset msg = "%u20AC">
<cfdump var="#urldecode(msg,"utf-8")#">
<cfoutput>#urldecode(msg,"utf-8")#</cfoutput>

and it looks good to me.

What ColdFusion version are you on?

wladekarek
Participant
August 24, 2018

Mybie I am doing something wrong, I have this code insode :

strResult.append(msg);

msg = urldecode(msg,"utf-8");

strResult.append(msg);

and it reveals that some utf-8 characters are not decoded properly.

It is coldfusion 8.

WolfShade
Legend
August 24, 2018

CF8 might be the problem.  CF is up to version 12 (CF2016), now, and CF2018 is on it's way.  You are way behind in your CF Server version.  You should upgrade to at least CF10 (later is better) just for security reasons, alone.

HTH,

^ _ ^