Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Encoding of UTF8 Characters in URL Strings

New Here ,
Mar 03, 2014 Mar 03, 2014

I need to link to a URL that has a right single quotation mark in it (u+2019).

This is it: "/News/Case-Studies/UNICEF-Headquarters’-Redesigned-Lobby-Space"

When I paste the link in to a browser, it works.

When I create a link with this URL as the href and click on it, it works.

When I put it in a <CFLocation> tag, it does not work.  I need to make this work.

When I take the URL from a browser address bar and paste it into a text editor, it converts the right single quote to the percent-encoded string "%E2%80%99".  I have not been able to recreate this encoding. URLEncodedFormat() yields a completely different string.

When I parse the url and note the ASCii value of each character, I get three characters for the right single quote:  226 8634 8482.

I came up with an encoding that worked, but resulted in special characters in the address bar of the browser (don't remember what the technique was at this point).

What encoding can I perfrom at the CF server to duplicate the proper UTF8 encoded string of %E2%80%99?

Any help would be appreciated.

TOPICS
Advanced techniques
1.6K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 04, 2014 Mar 04, 2014

If you convert U-2019 to base 10, you will get 8217. The character you want is then, in ColdFusion terms, chr(8217).

The coding goes like this:

<cfset base10Representation = inputBaseN(2019,16)>

<cfset rightSingleQuotationMark = chr(base10Representation)>

<cfset str="/News/Case-Studies/UNICEF-Headquarters" & rightSingleQuotationMark & "-Redesigned-Lobby-Space">

Alternatively, you could do everything in one go, like this

<cfset str2="/News/Case-Studies/UNICEF-Headquarters#chr(inputBaseN(2019,16))#-Redesigned-Lobby-Space">

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 04, 2014 Mar 04, 2014

Woe! What you have found is probably a bug. I could reproduce it as follows:

<cfset base10Representation = inputBaseN(2019,16)>

<cfset rightSingleQuotationMark = chr(base10Representation)>

<cfset str="http://127.0.0.1:8500/News/Case-Studies/UNICEF-Headquarters" & rightSingleQuotationMark & "-Redesigned-Lobby-Space">

<cflocation  url="#str#">

It replaces the quotation mark by a space, redirecting instead to

http://127.0.0.1:8500/News/Case-Studies/UNICEF-Headquarters%20-Redesigned-Lobby-Space

This is obviously wrong. You could use Javascript to create a workaround for cflocation, as follows:

<script type="text/javascript">

  <cfoutput>window.location.replace("#str#")</cfoutput>

</script>

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 04, 2014 Mar 04, 2014

I appreciate your replies, but I'm still mystified.

My user has pasted in a URL with a unicode single right quotation mark. I need to encode it as %E2%80%99.

When I parse the single right quotation mark, I get three ascii bytes; chr(226) & chr(8364) & chr(8482)

The first byte properly tells me that a three-byte unicode character is beginning (asc 226, or hex e2).

I would think the next two bytes would be an asc 128 (hex 80) and an asc 153 (hex 99), which I could then easily convert to %E2%80%99, the proper sequence for a right single quotation mark..

But the next two bytes report as different and significantly higher asc values (8364 and 8482).

I don't see how I can possibly convert this properly.

If I simply put a chr(8217) where the unicode character was, it works, but how on earth do I arrive at that?

How do I get from chr(226) & chr(8364) & chr(8482) to chr(8217)?

Still looking for ideas.  Thanks.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 04, 2014 Mar 04, 2014

I don't think you should proceed with chr(226) & chr(8364) & chr(8482). I do believe things were already messed up by the time you got there.

Those 3 characters stand for ’. You got that representation because you used the wrong encoding to display the single-right-quotation-mark, to start with. When you parse the single-right-quotation-mark, using the proper encoding, for example, UTF-8, you should get just one ASCII byte, namely, chr(8217).

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 05, 2014 Mar 05, 2014
LATEST

Ok, that pointed me in the right direction.

The string, of course, is stored in a database table.  When I get it directly from there instead of reading it off the URL, the chr(8217) is properly represented.

Conversion is then simple.

Thanks for discussing it with me.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources