Skip to main content
Participating Frequently
February 4, 2008
Question

Site conversion from win-1251 to utf-8

  • February 4, 2008
  • 11 replies
  • 1457 views
We have site that worked fine under CF5 all data keep in MSSQL (nvarchar field type), now we have to transfer site to CF8. The main problem that all data from DB appear as set of unreadable symbols. Target convert data in the DB from win-1251 charset to utf-8.
1.How can we make this conversion using standard assets without creation special script to make conversion.
2. OR is it possibly to force site to work normally with old win-1251 charset without any conversion.(now after site transferring all symbol unreadable).
    This topic has been closed for replies.

    11 replies

    Inspiring
    February 13, 2008
    what db? is it unicode capable?

    cf5 doesn't know encoding from a hole in the ground, it's likely your data is garbaged as far as cfmx or anything else that understands encoding is concerned.

    i would recommend converting the existing data to unicode. cheapest way is to make cfhttp calls from cf8 server to cf5 server pages that simply dump out the data. cf8 re-inserts the data correctly as unicode. or depending on your db, you might dump out the data as csv & use a tool like unifier ( http://www.melody-soft.com/html/unifier.html)

    if this isn't possible you might get away w/the existing data but that depends somewhat on your db.

    i moved house & office at the same time & lost all my internet connections (won't have any until maybe saturday), so that's why i've been AWOL on these i18n/g11n issues.
    Inspiring
    February 13, 2008
    S_mart wrote:
    > regional settings. Literally yesterday we tried to changes regional settings on
    > our development server and first of all receive one picture of work(when
    > system pick up new setting) then another(then all settings return) that very
    > strange and unstable.

    look elsewhere, changing regional settings won't make cf "strange and unstable".

    S_martAuthor
    Participating Frequently
    February 7, 2008
    2 Adam
    Probably you don't have problem with Russian because initially did everything in utf-8. We also didn't have any problem with projects that we did for CF7 and transfer to CF8. In this case everything is ok. We, unfortunately can't change regional setting on the hosting because a lot of site on the server, not only our. Besides that CF very strange reacts on the change of regional settings. Literally yesterday we tried to changes regional settings on our development server and first of all receive one picture of work(when system pick up new setting) then another(then all settings return) that very strange and unstable.
    Inspiring
    February 7, 2008
    Russian! About 50% of my year last year was working on a Russian-language
    site. Quite a tricky undertaking for someone such as myself who knows not
    a jot of the language concerned. Moscow was a very nice city though :-)

    Let me think about this and come back to you tomorrow with some questions.

    We didn't have to do anything special to get Cyrillic text to do the round
    trip from DB->CF->browser->CF->DB. But then again that was with the
    default (UK) install of MS SQL Server, and maybe your possible
    regional-specific settings cause a challenge.

    My confusion about CP-1251 and CP-1252 aside, did you manage to extract DB
    data and present on an HTML page by telling all three components what
    character encoding to read / receive / deliver?

    --
    Adam
    S_martAuthor
    Participating Frequently
    February 6, 2008
    2 Adam
    No we speak about CP-1251, not 1252 and 1251 unfortunately isn't subset of ISO-8859-1.

    2 Azadi.
    Possibly you are right, BUT CF5 used ODBC drivers and CF8 use JDBC drivers, but its difficult to find necessary ODBC drivers for CF8 and much more difficult to set these drivers on the hosting.

    Some interesting facts:
    - although nvarchar fields keep data in the Unicode BUT data entered on the page with cp-1251 coding and on page with utf-8 in the table has different appearance. I.e. the code of the stored data is different all the same?

    -If Windows server where set up CF8 in that moment has Russian location that is native for windows-1251 charset. then regardless of page charset data always save and show correct in Russian.

    -If make damp of table with data that was kept in 1251 in csv file, open it in the editor and save file in the Unicode(not UTF-8) and after that make data import in the new table then data will be show correct! on the site with utf-8 charset.
    It seems that we don't have another way do make like it. nobody offer another way.
    Inspiring
    February 6, 2008
    > UNICODE UCS-2 character set

    Where's Paul Hastings when we need him? He seems to know all this sh!t
    inside out.

    I'm out of my depth now, but first you were talking about CP-1252, now
    you're talking about UCS-2, which is UTF-16, not UTF-8 according to
    Wikipedia. And I'm not sure where CP-1252 fits into that lot at all.

    Have you tried specifying ISO-8859-1 instead of UTF-8? It sounds to me
    (again, via Wikipedia) that CP-1252 is a subset of ISO-8859-1.

    http://en.wikipedia.org/wiki/CP-1252
    http://en.wikipedia.org/wiki/UCS-2

    Basically, it doesn't matter what format your data is stored in, as long as
    you tell the DB drivers and CF what to expect when the DB returns data.
    After that, it's a matter of telling the HTTP response to also identify
    itself as the same encoding scheme to the browser engine.

    --
    Adam
    Inspiring
    February 5, 2008
    a shot in the dark, but couldn't the problem be with the d connection
    drivers? they have definitely changed with the cf versions...
    if so, there should be some setting you can apply to the driver to
    enforce one encoding over another for the connection... (i do not work
    with mssql much so i can; say for sure...)

    ---
    Azadi Saryev
    Sabai-dee.com
    http://www.sabai-dee.com
    S_martAuthor
    Participating Frequently
    February 5, 2008
    You are right. Using <cfheader name="Content-type" value="text/html; charset=windows-1251> allow to put necessary charset on the page. Unfortunately its not solve the problem.

    >It sounds to me like the data coming from the DB is UTF-8
    As we wrote data keep in nchar and nvarchar fields (MSSQL2000). These are character data types that are Unicode data and use the UNICODE UCS-2 character set. But it did not prevent to show them correct in CF5! Don't understand what was changed in the internal logic of CF server why this data don't show correct.

    >This sort of issue has done the rounds on these forums many times, and there's a lot of discussion already in place.
    We have looked for similar topic but only find recommendation to use utf-8 coding anywhere (cfprocessing, meta, setEncoding, cfheader, cfcontent). We only find one similar problem when somebody want to transfer Arabic site from CF5 to CF7. In the end he was advised to transfer database from the fields of char in nchar, and site on the code of utf-8. We do not want to transfer site to utf-8 because it contains a lot of data, Besides that we already have nchar fields.
    Inspiring
    February 5, 2008
    If the browser is detecting the returned HTML as UTF-8, you might want to
    tell it it's not. Have a look at CFEHADER:
    http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=Tags_g-h_07.html

    But it's probably thinking it's UTF-8 because it *is* UTF-8.

    Note: CFPROCESSINGDIRECTIVE is only relevant if your CFM templates have
    specific character-set data in them; it has no bearing on how data is
    processed.

    And the META tag doesn't have any impact on data processing either, it just
    tells the browser what charset to treat the data as, when rendering it.

    It sounds to me like the data coming from the DB is UTF-8.

    This sort of issue has done the rounds on these forums many times, and
    there's a lot of discussion already in place. As well as reading the docs,
    did you search the forums (you're probably better off using Google than the
    forums' own search) and read all that bumpf too?

    --
    Adam
    S_martAuthor
    Participating Frequently
    February 5, 2008
    Thank you for answer, but we read documentation. Unfortunately usage of
    <cfprocessingdirective pageencoding="windows-1251"> and <meta http-equiv="Content-Type" content="text/html; charset=windows-1251"> gives no effect. Text appear in non-readable mode and browser anyway determine page as utf-8.