Skip to main content
May 18, 2006
Question

Japanese Characters working as URL parameters, turning to question marks when in URL string itself

  • May 18, 2006
  • 19 replies
  • 3009 views
I'm having some trouble getting coldfusion to see japanese characters in the URL string.

To clarify, if I have something like this:

http://my.domain.com/index.cfm?categorylevel0=Search&categorylevel1=%E3%82%A2%E3%82%B8%E3%82%A2%E3%83%BB%E3%83%93%E3%82%B8%E3%83%8D%E3%82%B9%E9%96%8B%E7%99%BA

All of my code works correctly and the server is able to pass the japanese characters to the database and retrieve the correct data.

If I have this instead:

http://my.domain.com/index.cfm/Search/%E3%82%A2%E3%82%B8%E3%82%A2%E3%83%BB%E3%83%93%E3%82%B8%E3%83%8D%E3%82%B9%E9%96%8B%E7%99%BA

My script (which works fine with English characters) parses CGI variables and converts these to the same URL parameters that I had in the first URL using a loop and a CFSET url.etc..

In the first example, looking at the CF debug info shows me what I expect to see:

URL Parameters:
CATEGORYLEVEL0=Search
CATEGORYLEVEL1=アジア・ビジネス開発

In the second example it shows me this:
URL Parameters:
CATEGORYLEVEL0=Search
CATEGORYLEVEL1=???·??????

Can anyone suggest means for debugging this? I'm not sure if this is a CF problem, an IIS problem, a JRUN problem or something else altogether that causes it to lose the characters if they are in the URL string but NOT as a parameter.
This topic has been closed for replies.

19 replies

Inspiring
February 26, 2007
Adam Cameron wrote:
> I strongly suspect that having unencoded Cyrillic characters in the URL is
> probably illegal anyhow, so I shall probably translate it back to English.

check what the JVM is using as default file, etc. encoding.
Inspiring
February 25, 2007
> It's basically because CF is seeing:
> /path/??????/index.cfm rather than /path/??????/index.cfm, and obviously a ?
> is an illegal character in a file path.

Err... OK, the Cyrillic characters didn't make it through to the news feed,
it seems (it's OK on the web UI though). Imagine the second path there has
the word "Russia" in Russian, instead of question marks ;-)

--
Adam
February 25, 2007
Hi
Did anyone get to the bottom of this? I have a similar problem, but with the actual file path, not the URL string. I have directories and files in Russian.

On my dev PC (XP, using the internal Jrun webserver), this works fine.
On my live server (W2k3, IIS), I get this error:

java.io.IOException: The system cannot find the file specified
at java.io.WinNTFileSystem.canonicalize0(Native Method)

It's basically because CF is seeing:
/path/??????/index.cfm rather than /path/россия/index.cfm, and obviously a ? is an illegal character in a file path.

Note that it's not Windows or IIS' fault, as if I rename index.cfm to index.htm, it works fine.

I guess it's something not quite right with the IIS->CF connector.

I strongly suspect that having unencoded Cyrillic characters in the URL is probably illegal anyhow, so I shall probably translate it back to English.

Thoughts?

--
Adam
BKBK
Community Expert
Community Expert
May 23, 2006
So, the Japanese doesn't stick. Would be interesting to see what the CF team makes of it.

May 23, 2006
Ah ok. I ran this test on both URLs and it worked fine using the first URL as the URL I actually accessed the page with and cfset'ing the second url just like you did. The problem seems to happen earlier then that, with the data getting mangled before coldfusion sets the CGI variables. CGI.PATH_INFO would be the only way to access the URL in a search-engine friendly URL scenario right? Unfortunately it is always this: PATH_INFO=/Search/???·??????
May 22, 2006
Surely, just meant that it does not set anything at all because of this if:

<cfif Len(cgi.query_string) neq 0>
...
</cfif>

the Length of cgi.query_string is 0, so it is not getting past this.
BKBK
Community Expert
Community Expert
May 23, 2006
My suggestion was that you test with the first url, not the second. However, I can see a source of confusion. I overlooked your delimiter, "/". It should be "?" and "=" in this case. With these modifications, we get

<cfif Len(cgi.query_string) neq 0>
<cfset i = 1>
<cfloop list="#cgi.query_string#" delimiters="&" index="currentcatname">
<cfoutput>categorylevel#i# = #ListGetAt(currentcatname,2,"=")#</cfoutput><br>
<cfset i = i + 1>
</cfloop>

If it is a failing of Coldfusion, the above test should fail, too.

Now, an adaptation of the same test to your second url.

<cfset url2 = " http://my.domain.com/index.cfm/Search/%E3%82%A2%E3%82%B8%E3%82%A2%E3%83%BB%E3%83%93%E3%82%B8%E3%83%8D%E3%82%B9%E9%96%8B%E7%99%BA">

<cfset query_str = ListGetAt(replacenocase(url2,".cfm/","?"),2,"?")>
<cfif Len(query_str) neq 0>
<cfset i = 1>
<cfloop list="#query_str#" delimiters="/" index="currentcatname">
<cfoutput>categorylevel#i# = #currentcatname#</cfoutput><br>
<cfset i = i + 1>
</cfloop>
BKBK
Community Expert
Community Expert
May 22, 2006
It doesn't get past that first if statement
Please clarify
May 22, 2006
Thanks for the fast response, no dice on this. It doesn't get past that first if statement, cgi.query_string seems to lose everything, not just the utf-8 characters.
BKBK
Community Expert
Community Expert
May 22, 2006
I would use cgi.query_string directly and modify the ambiguous statement, <cfset "url.categorylevel#i#" = currentcatname>. Do things get better with

<cfif Len(cgi.query_string) neq 0>
<cfset i = 1>
<cfloop list="#cgi.query_string#" delimiters="/" index="currentcatname">
<cfset categorylevel["#i#"] = currentcatname>
<cfset i = i + 1>
</cfloop>


added edit: needs correcting; see my next posts


May 22, 2006
You bet, 3rd post in this topic is the exact code I'm using for that.

If its beneficial, here it is again:


<cfset cgipathinfo = cgi.SCRIPT_NAME & cgi.path_info>
<cfset query_string_length = Len(cgipathinfo)-Len(CGI.SCRIPT_NAME)>

<cfif query_string_length neq 0>
<cfset query_string = Right(cgipathinfo, query_string_length)>
<cfset i = 0>
<cfloop list="#query_string#" delimiters="/" index="currentcatname">
<cfset "url.categorylevel#i#" = currentcatname>
<cfset i = i + 1>
</cfloop>

Its not surprising that it doesn't work at this step, since in the debug info the CGI variables are showing up as this:

CGI Variables:
HTTP_ACCEPT=text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
HTTP_ACCEPT_ENCODING=gzip,deflate
HTTP_ACCEPT_LANGUAGE=en-us,en;q=0.5
PATH_INFO=/Search/???·??????
QUERY_STRING=

With the question marks in path_info and a blank query_string.