Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Site conversion from win-1251 to utf-8

New Here ,
Feb 04, 2008 Feb 04, 2008
We have site that worked fine under CF5 all data keep in MSSQL (nvarchar field type), now we have to transfer site to CF8. The main problem that all data from DB appear as set of unreadable symbols. Target convert data in the DB from win-1251 charset to utf-8.
1.How can we make this conversion using standard assets without creation special script to make conversion.
2. OR is it possibly to force site to work normally with old win-1251 charset without any conversion.(now after site transferring all symbol unreadable).
1.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 04, 2008 Feb 04, 2008
S_mart wrote:

Read the ColdFusion documentation for 'Internationalization' of
applications. You can define your application to use the window-1251
character set. There are several HTML and CFML tags and settings involved.

HTH
Ian
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 05, 2008 Feb 05, 2008
Thank you for answer, but we read documentation. Unfortunately usage of
<cfprocessingdirective pageencoding="windows-1251"> and <meta http-equiv="Content-Type" content="text/html; charset=windows-1251"> gives no effect. Text appear in non-readable mode and browser anyway determine page as utf-8.
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 05, 2008 Feb 05, 2008
If the browser is detecting the returned HTML as UTF-8, you might want to
tell it it's not. Have a look at CFEHADER:
http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=Tags_g-h_07.html

But it's probably thinking it's UTF-8 because it *is* UTF-8.

Note: CFPROCESSINGDIRECTIVE is only relevant if your CFM templates have
specific character-set data in them; it has no bearing on how data is
processed.

And the META tag doesn't have any impact on data processing either, it just
tells the browser what charset to treat the data as, when rendering it.

It sounds to me like the data coming from the DB is UTF-8.

This sort of issue has done the rounds on these forums many times, and
there's a lot of discussion already in place. As well as reading the docs,
did you search the forums (you're probably better off using Google than the
forums' own search) and read all that bumpf too?

--
Adam
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 05, 2008 Feb 05, 2008
You are right. Using <cfheader name="Content-type" value="text/html; charset=windows-1251> allow to put necessary charset on the page. Unfortunately its not solve the problem.

>It sounds to me like the data coming from the DB is UTF-8
As we wrote data keep in nchar and nvarchar fields (MSSQL2000). These are character data types that are Unicode data and use the UNICODE UCS-2 character set. But it did not prevent to show them correct in CF5! Don't understand what was changed in the internal logic of CF server why this data don't show correct.

>This sort of issue has done the rounds on these forums many times, and there's a lot of discussion already in place.
We have looked for similar topic but only find recommendation to use utf-8 coding anywhere (cfprocessing, meta, setEncoding, cfheader, cfcontent). We only find one similar problem when somebody want to transfer Arabic site from CF5 to CF7. In the end he was advised to transfer database from the fields of char in nchar, and site on the code of utf-8. We do not want to transfer site to utf-8 because it contains a lot of data, Besides that we already have nchar fields.
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 05, 2008 Feb 05, 2008
a shot in the dark, but couldn't the problem be with the d connection
drivers? they have definitely changed with the cf versions...
if so, there should be some setting you can apply to the driver to
enforce one encoding over another for the connection... (i do not work
with mssql much so i can; say for sure...)

---
Azadi Saryev
Sabai-dee.com
http://www.sabai-dee.com
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 05, 2008 Feb 05, 2008
> UNICODE UCS-2 character set

Where's Paul Hastings when we need him? He seems to know all this sh!t
inside out.

I'm out of my depth now, but first you were talking about CP-1252, now
you're talking about UCS-2, which is UTF-16, not UTF-8 according to
Wikipedia. And I'm not sure where CP-1252 fits into that lot at all.

Have you tried specifying ISO-8859-1 instead of UTF-8? It sounds to me
(again, via Wikipedia) that CP-1252 is a subset of ISO-8859-1.

http://en.wikipedia.org/wiki/CP-1252
http://en.wikipedia.org/wiki/UCS-2

Basically, it doesn't matter what format your data is stored in, as long as
you tell the DB drivers and CF what to expect when the DB returns data.
After that, it's a matter of telling the HTTP response to also identify
itself as the same encoding scheme to the browser engine.

--
Adam
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 06, 2008 Feb 06, 2008
2 Adam
No we speak about CP-1251, not 1252 and 1251 unfortunately isn't subset of ISO-8859-1.

2 Azadi.
Possibly you are right, BUT CF5 used ODBC drivers and CF8 use JDBC drivers, but its difficult to find necessary ODBC drivers for CF8 and much more difficult to set these drivers on the hosting.

Some interesting facts:
- although nvarchar fields keep data in the Unicode BUT data entered on the page with cp-1251 coding and on page with utf-8 in the table has different appearance. I.e. the code of the stored data is different all the same?

-If Windows server where set up CF8 in that moment has Russian location that is native for windows-1251 charset. then regardless of page charset data always save and show correct in Russian.

-If make damp of table with data that was kept in 1251 in csv file, open it in the editor and save file in the Unicode(not UTF-8) and after that make data import in the new table then data will be show correct! on the site with utf-8 charset.
It seems that we don't have another way do make like it. nobody offer another way.
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 06, 2008 Feb 06, 2008
Russian! About 50% of my year last year was working on a Russian-language
site. Quite a tricky undertaking for someone such as myself who knows not
a jot of the language concerned. Moscow was a very nice city though :-)

Let me think about this and come back to you tomorrow with some questions.

We didn't have to do anything special to get Cyrillic text to do the round
trip from DB->CF->browser->CF->DB. But then again that was with the
default (UK) install of MS SQL Server, and maybe your possible
regional-specific settings cause a challenge.

My confusion about CP-1251 and CP-1252 aside, did you manage to extract DB
data and present on an HTML page by telling all three components what
character encoding to read / receive / deliver?

--
Adam
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 07, 2008 Feb 07, 2008
2 Adam
Probably you don't have problem with Russian because initially did everything in utf-8. We also didn't have any problem with projects that we did for CF7 and transfer to CF8. In this case everything is ok. We, unfortunately can't change regional setting on the hosting because a lot of site on the server, not only our. Besides that CF very strange reacts on the change of regional settings. Literally yesterday we tried to changes regional settings on our development server and first of all receive one picture of work(when system pick up new setting) then another(then all settings return) that very strange and unstable.
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 13, 2008 Feb 13, 2008
S_mart wrote:
> regional settings. Literally yesterday we tried to changes regional settings on
> our development server and first of all receive one picture of work(when
> system pick up new setting) then another(then all settings return) that very
> strange and unstable.

look elsewhere, changing regional settings won't make cf "strange and unstable".

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 13, 2008 Feb 13, 2008
LATEST
what db? is it unicode capable?

cf5 doesn't know encoding from a hole in the ground, it's likely your data is garbaged as far as cfmx or anything else that understands encoding is concerned.

i would recommend converting the existing data to unicode. cheapest way is to make cfhttp calls from cf8 server to cf5 server pages that simply dump out the data. cf8 re-inserts the data correctly as unicode. or depending on your db, you might dump out the data as csv & use a tool like unifier ( http://www.melody-soft.com/html/unifier.html)

if this isn't possible you might get away w/the existing data but that depends somewhat on your db.

i moved house & office at the same time & lost all my internet connections (won't have any until maybe saturday), so that's why i've been AWOL on these i18n/g11n issues.
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources