Skip to main content
February 2, 2011
Answered

Storing unicode (khmer) woes

  • February 2, 2011
  • 1 reply
  • 1149 views

Hey all,

I am working on an applicaiton that needs to accept the khmer language in various text inputs and mssql database. I have gotten most of it working, but still have one bug. I can display khmer characters if they are typed in. If I copy and paste khmer text directly in my database, and query for it, it comes out properly. The issue is when I take khmer text from a form field and insert it, then it is transformed into a bunch of ?????.

Here are the steps I've taken so far to enable unicode on my website

- Configured the datasource to accept high ascii values and unicode

- Configured the database table columns to be of type nvarchar

- Added

     <cfscript>
        SetEncoding("form","utf-8");
        SetEncoding("url","utf-8");
       </cfscript>

       <cfcontent type="text/html; charset=utf-8">

to my application.cfm file.

-Added <META http-equiv="Content-Type" content="text/html; charset=utf-8"> in the head of my pages.

-Added <cfprocessingdirective pageEncoding="utf-8"> on my page that attempts to update the database.

It's weird. If i copy and paste khmer directly in the DB and query for it that works fine. If I hard code some khmer on a page, that displays fine to. If I type in khmer into a form, and dump the form value back out, that works. It's only when a form value is saved to the database and pulled back out is it mangled. You can see an example here of what I'm talking about.

http://www.psasmart.com/test.cfm

And here is the code that makes that page.

<cfscript>
        SetEncoding("form","utf-8");
        SetEncoding("url","utf-8");
</cfscript>

<cfcontent type="text/html; charset=utf-8">

<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<cfprocessingdirective pageEncoding="utf-8">

If you have the Khmer language pack installed this:  <h2>ម៉ោង​ផ្សាយ​-រលក​ធ</h2> should appear as cambodian text.
<hr />

<form name="submitForm" method="post" accept-charset="utf-8">
     Now enter some Khmer text to save to the database: <input name="text" type="text" value="ម៉ោង​ផ្សាយ​-រលក​ធ">
    <br />
    <input type="submit" name="submit" value="submit" >
</form>

<cfoutput>

     <cfif isdefined("form.submit")>
          This is the same text as entered in the form: <h2>#form.text#</h2><br />
     
          
          <cfquery name="update" datasource="#application.dsn#" >
               Update serverSettings
               SET khmerReadWriteTest ='#form.text#'
          </cfquery>
     </cfif>
     
     <Cfquery name="getKhemer" datasource="#application.dsn#">
          select khmerReadTest, khmerReadWriteTest
          from serverSettings
     </Cfquery>


     This is the same text as entered in the form but saved to the db and queried for then displayed:  <h2>#getKhemer.khmerReadWriteTest#</h2>
     This is some Sample Khmer Text Inputed Directly in the database then queried for and displayed:  <h2>#getKhemer.khmerReadTest#</h2>
</cfoutput>

    This topic has been closed for replies.
    Correct answer tooMuchTrouble

    On 2/3/2011 12:17 AM, kenji776 said:

    >

    Hey all, I am working on an applicaiton that needs to accept the khmer

    language in various text inputs and mssql database. I have gotten most of it

    working, but still have one bug. I can display khmer characters if they are

    typed in. If I copy and paste khmer text directly in my database, and query

    for it, it comes out properly. The issue is when I take khmer text from a

    form field and insert it, then it is transformed into a bunch of ?????.

    you should already know the answer to this. btw it's not just khmer, it's any

    unicode encoded text.

    first the usual suspects: what db driver? 100% sure you're using the correct dsn?

    then this caught my eye: SET khmerReadWriteTest ='#form.text#'

    uh either use cfqueryparam (good practice besides you turned on unicode in the

    dsn anyway) or unicode hinting.:

    SET khmerReadWriteTest=N'#form.text#'

    guess you didn't look close enough at my "greek test" code

    1 reply

    tooMuchTroubleCorrect answer
    Inspiring
    February 2, 2011

    On 2/3/2011 12:17 AM, kenji776 said:

    >

    Hey all, I am working on an applicaiton that needs to accept the khmer

    language in various text inputs and mssql database. I have gotten most of it

    working, but still have one bug. I can display khmer characters if they are

    typed in. If I copy and paste khmer text directly in my database, and query

    for it, it comes out properly. The issue is when I take khmer text from a

    form field and insert it, then it is transformed into a bunch of ?????.

    you should already know the answer to this. btw it's not just khmer, it's any

    unicode encoded text.

    first the usual suspects: what db driver? 100% sure you're using the correct dsn?

    then this caught my eye: SET khmerReadWriteTest ='#form.text#'

    uh either use cfqueryparam (good practice besides you turned on unicode in the

    dsn anyway) or unicode hinting.:

    SET khmerReadWriteTest=N'#form.text#'

    guess you didn't look close enough at my "greek test" code

    February 2, 2011

    I'm pretty darn sure it's the correct DSN, but will double check.

    I normally do use cfquery param, i just removed it for now because I thought maybe some of the validations it was performing was thinking the unicode was some kind of injection attack and cleaning it or something. It was late and I was grasping at straws. I'll add it back.

    I'm not familiar with unicode hinting. I'll have to do some more research on that.

    thanks for the tips. I know I am probably being dumb, but this is my first ever attempt at a multilingual site, so I'm totally just learning as I go. thanks for your patience.