Skip to main content
Known Participant
March 3, 2014
Question

Encoding of UTF8 Characters in URL Strings

  • March 3, 2014
  • 1 reply
  • 1751 views

I need to link to a URL that has a right single quotation mark in it (u+2019).

This is it: "/News/Case-Studies/UNICEF-Headquarters’-Redesigned-Lobby-Space"

When I paste the link in to a browser, it works.

When I create a link with this URL as the href and click on it, it works.

When I put it in a <CFLocation> tag, it does not work.  I need to make this work.

When I take the URL from a browser address bar and paste it into a text editor, it converts the right single quote to the percent-encoded string "%E2%80%99".  I have not been able to recreate this encoding. URLEncodedFormat() yields a completely different string.

When I parse the url and note the ASCii value of each character, I get three characters for the right single quote:  226 8634 8482.

I came up with an encoding that worked, but resulted in special characters in the address bar of the browser (don't remember what the technique was at this point).

What encoding can I perfrom at the CF server to duplicate the proper UTF8 encoded string of %E2%80%99?

Any help would be appreciated.

This topic has been closed for replies.

1 reply

BKBK
Community Expert
Community Expert
March 4, 2014

If you convert U-2019 to base 10, you will get 8217. The character you want is then, in ColdFusion terms, chr(8217).

The coding goes like this:

<cfset base10Representation = inputBaseN(2019,16)>

<cfset rightSingleQuotationMark = chr(base10Representation)>

<cfset str="/News/Case-Studies/UNICEF-Headquarters" & rightSingleQuotationMark & "-Redesigned-Lobby-Space">

Alternatively, you could do everything in one go, like this

<cfset str2="/News/Case-Studies/UNICEF-Headquarters#chr(inputBaseN(2019,16))#-Redesigned-Lobby-Space">

BKBK
Community Expert
Community Expert
March 4, 2014

Woe! What you have found is probably a bug. I could reproduce it as follows:

<cfset base10Representation = inputBaseN(2019,16)>

<cfset rightSingleQuotationMark = chr(base10Representation)>

<cfset str="http://127.0.0.1:8500/News/Case-Studies/UNICEF-Headquarters" & rightSingleQuotationMark & "-Redesigned-Lobby-Space">

<cflocation  url="#str#">

It replaces the quotation mark by a space, redirecting instead to

http://127.0.0.1:8500/News/Case-Studies/UNICEF-Headquarters%20-Redesigned-Lobby-Space

This is obviously wrong. You could use Javascript to create a workaround for cflocation, as follows:

<script type="text/javascript">

  <cfoutput>window.location.replace("#str#")</cfoutput>

</script>

doncxAuthor
Known Participant
March 4, 2014

I appreciate your replies, but I'm still mystified.

My user has pasted in a URL with a unicode single right quotation mark. I need to encode it as %E2%80%99.

When I parse the single right quotation mark, I get three ascii bytes; chr(226) & chr(8364) & chr(8482)

The first byte properly tells me that a three-byte unicode character is beginning (asc 226, or hex e2).

I would think the next two bytes would be an asc 128 (hex 80) and an asc 153 (hex 99), which I could then easily convert to %E2%80%99, the proper sequence for a right single quotation mark..

But the next two bytes report as different and significantly higher asc values (8364 and 8482).

I don't see how I can possibly convert this properly.

If I simply put a chr(8217) where the unicode character was, it works, but how on earth do I arrive at that?

How do I get from chr(226) & chr(8364) & chr(8482) to chr(8217)?

Still looking for ideas.  Thanks.