Skip to main content
January 18, 2007
Answered

Problem with UTF-8 encoding

  • January 18, 2007
  • 23 replies
  • 3250 views
The problem is that although I have finally gotten the static text to display right the dynamitic which is queried from a mySQL database is not being displayed correctly,

I have set checked the database the Spanish, French, and other translations for the contents are there with the correct lettering. I have updated the mySQL drivers to 5.0 as recommended by Adobe, I have placed in the URL string of the JDBC the ?useUnicode=true&characterEncoding=UTF-8 as suggested by another forum. I have even checked all the pages properties to make sure that they are in UTF-8 encoding format, below is a sample of the code I am using what is wrong with the code, or what do I need to change to fix this problem. You may check the site at www.scoringag.com and try the languages translations to see further examples of the problem.

We are using MX7 MySQL4.1 Jconnect5.0

Sample code below:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=" http://www.w3.org/1999/xhtml">
<head>

<!---
**
* CF MX Admin "Application.cfm" file
* This file establishes the cfadmin application, as well as creates handles
* to the services using the factory via CFOBJECT.
*
* Copyright (c) 2001 Macromedia. All Rights Reserved.
* DO NOT REDISTRIBUTE THIS SOFTWARE IN ANY WAY WITHOUT THE EXPRESSED
* WRITTEN PERMISSION OF MACROMEDIA.
--->

<!--- Set multi-language utf-8 values here
---------------------------------------------------------------------->
<cfprocessingdirective pageencoding="utf-8">

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<cfset URLenChar = "utf-8" >

<!--- Set encoding to utf-8. --->
<cfset setEncoding("URL", "utf-8")>
<cfset setEncoding("Form", "utf-8")>

<cfparam name="url.login" default="">

<!--- Set the output encoding to utf-8 --->
<cfcontent type="text/html; charset=utf-8">

</head>

<cfset SESSION.locale='es'>

<!--- <div id="home_contents"> --->
<style type="text/css">
<!--
.style2 {color: #ff0000}
-->
</style>

<div id="content">
<table align="center" width="100%">

<tr><center>

<div align="center" style="width:100%; font-size:13px; font-weight:500; color:#000000; "><br />
<a href=" http://www.cfsan.fda.gov/~dms/fsbtac23.html" target="_blank" class="style2" >*** Important Information (please read)! ***<br />
FDA Fact Sheet ScoringAg has the Solution! </a><br />
<a href="Public/docs/Acciones de la FDA en la nueva legislacion del Bioterrorismo.pdf" target="_blank" class="style2">Haga clic para aquí ver
los Hechos de los USA FDA - en Español</a> <br />
<br />
<cfscript>ssite.translate('#SESSION.Locale#', 1, 111);</cfscript></div><br />

This topic has been closed for replies.
Correct answer
And while you two are debating the issue I removed the Dateformat tag in the copyright clause at the bottom of the page, problem fixed. Don't ask why, I don't know but it works now, go figure, now I move to my next problem, real time video feed of a cow walking, don't ask I just do, just do :)

David Gamache

23 replies

Inspiring
January 19, 2007
> And while you two are debating the issue...

Heh. Oops.

> I removed the Dataformat tag in the
> copyright clause

Can you post the relevant line of code?

I thought you said the problem was content coming from the DB?

--
Adam
Participant
January 19, 2007
bad code
<img src="#images#button_green.jpg" alt="" border="0">   <a href=" http://www.scoringsystem.com/scoringsystem/sandbox/copyright/copyright.cfm" target="_blank">&copy;#objtranslate.translate('#SESSION.Locale#', 4, 11)# 2002 - #DateFormat(now(), "yyyy")# / #objtranslate.translate('#SESSION.Locale#', 4, 15)#<!---&copy; Copyright 2002 - now()/Terms of Service---></a>    

changed code
<img src="#images#button_green.jpg" alt="" border="0">   <a href=" http://www.scoringsystem.com/scoringsystem/sandbox/copyright/copyright.cfm" target="_blank">&copy;#objtranslate.translate('#SESSION.Locale#', 4, 11)# 2002 - 2007  / #objtranslate.translate('#SESSION.Locale#', 4, 15)#<!---&copy; Copyright 2002 - now()/Terms of Service---></a>     Text

why it did and why it now working I don't know or care, I will leave that too the experts like you.

David Gamache
Correct answer
January 19, 2007
And while you two are debating the issue I removed the Dateformat tag in the copyright clause at the bottom of the page, problem fixed. Don't ask why, I don't know but it works now, go figure, now I move to my next problem, real time video feed of a cow walking, don't ask I just do, just do :)

David Gamache
Inspiring
January 19, 2007
>> Well it's either me or someone on my team. All of which are developers,
>> rather than gibbons, so ought to know what they're doing.
>
> you assume too much.

I would rather pick up the problem and deal with it (by instructing the
miscreant of the ins and outs of UTF-8 and CF's incapabilities in that
regard), than have a sledge-hammer/walnut approach such as yours.

There's also the fact that in over 3000 CF templates (>10MB of raw
character data) in our (multi-lingual, I might add) software, there is not
yet one instance of there being UTF-8 data being present on a CF template.
Which kinda puts into perspective how sensible - in my mind - it is to
globally "deal with" a situation that is in fact not that common. Of
course our s/w is not statistically representative of everyone's situation,
but it's some sort of measure.

But go your hardest... I'm not trying to convince you to do anything other
than what already makes you happy. I *am* perhaps trying to offer an
alternative position to your opinion it's a "good practice", though, I
guess.

--
Adam
Inspiring
January 19, 2007
> again, the BOM is optional.

I think we could be talking @ cross-purposes. Either that or one or both
of us is being dense.

If I create a NEW text file in notepad.exe, it defaults to ANSI. If I then
insert into that file UTF-8 content, notepad.exe NOTICES this, and when I
go to save it as ANSI (no BOM), says "well... you better not... you'll
mangle your data". So notepad.exe can tell when a file hass UTF-8 content
WITHOUT the BOM being there. As it should. Like you said.

As you say, the BOM is entirely optional. So an application needs to use
*some other mechanism* to detect if it should be parsing as plain old ASCII
text, or whether it needs to treat it as UTF-8.

If notepad.exe can do this without a special <cfprocessingdirective-like>
tag, or a BOM, then blimin' CF should be capable of doing it too. one
certainly should NOT have to MANUALLY advise CF - in EVERY FILE - what it
should be doing. Bloody ridiculous.

--
Adam
Inspiring
January 19, 2007
Adam Cameron wrote:
> Well it's either me or someone on my team. All of which are developers,
> rather than gibbons, so ought to know what they're doing.

you assume too much.

> We can agree to disagree, which is fine, but I think your practice is a poor
> one. It's far more useful to leave the simple ASCII files alone, and IFF a
> file has UTF-8 content in it, for whatever reason, THEN mark it accordingly.

again, the BOM is optional.

> Sure. Which imples that it's not adequate to rely on it being there. So the
> responsibility falls onto the application reading the file to determine
> whether the content is UTF-8 or not. If NOTEPAD can manage it, I puzzle as
> to why CF cannot, and relies on people like you to put
> <cfprocessingdirective> at the top of every template.

only if the BOM isn't there--once again it's optional.

Inspiring
January 19, 2007
>actually that's exactly what i'm telling you. unless you have 100% perfect control over all your cf pages, all the time, somebody can come along & edit them.

Well it's either me or someone on my team. All of which are developers,
rather than gibbons, so ought to know what they're doing.

We can agree to disagree, which is fine, but I think your practice is a
poor one. It's far more useful to leave the simple ASCII files alone, and
IFF a file has UTF-8 content in it, for whatever reason, THEN mark it
accordingly. It is then a flag to anyone reviewing it that it's there,
like a warning "yes, I meant it to be like this, there is a reason, HEED".


>> - if it's such good practice - it is that CF cannot work out for itself
>> that the file has UTF-8 content(*), and why it's up to the developer to
>> tell it. You can't have it both ways.
>
> once again, the BOM is optional.

Sure. Which imples that it's not adequate to rely on it being there. So
the responsibility falls onto the application reading the file to determine
whether the content is UTF-8 or not. If NOTEPAD can manage it, I puzzle as
to why CF cannot, and relies on people like you to put
<cfprocessingdirective> at the top of every template.

--
Adam
Inspiring
January 19, 2007
Adam Cameron wrote:
> You can't tell me it's "good practice" to include that tag on EVERY FILE in
> an application, "just in case". Because that would only lead me to ask why

actually that's exactly what i'm telling you. unless you have 100% perfect
control over all your cf pages, all the time, somebody can come along & edit them.

> - if it's such good practice - it is that CF cannot work out for itself
> that the file has UTF-8 content(*), and why it's up to the developer to
> tell it. You can't have it both ways.

once again, the BOM is optional.

> Do YOU, Paul, put <cfprocessingdirective> at the top of ALL your files?

for real work, pretty much so, those are my good practices. i do admit to
knocking tests/demos of without it.

> Anyway, "just in case" scenarios should not apply to source code, should
> it? The developer will (well: SHOULD) know whether their templates have
> UTF-8 data within it.

see above.

Inspiring
January 19, 2007
> actually it's good practice to use <cfprocessingdirective> as the BOM is
> optional for utf-8.

Which would only be relevant IF the file contained UTF-8 data. Like I
said.

You can't tell me it's "good practice" to include that tag on EVERY FILE in
an application, "just in case". Because that would only lead me to ask why
- if it's such good practice - it is that CF cannot work out for itself
that the file has UTF-8 content(*), and why it's up to the developer to
tell it. You can't have it both ways.

Do YOU, Paul, put <cfprocessingdirective> at the top of ALL your files?

Anyway, "just in case" scenarios should not apply to source code, should
it? The developer will (well: SHOULD) know whether their templates have
UTF-8 data within it.

--
Adam

(*) Especially when the file DOES have a UTF-8 BOM.
Inspiring
January 19, 2007
Sabaidee wrote:
> hmm.... makes me wonder if it has anything to do with the text from db being
> returned through a cfc... i am not sure what default encoding/character set is
> used by CF in that case and how to change it.

for cfmx (cf6 & above) it's utf-8, that should be common knowledge. for cf5 &
older versions it's supposed to be latin-1 but cf never really paid much
attention to encoding in those versions.
Inspiring
January 19, 2007
Adam Cameron wrote:
> You only need <cfprocessingdirective> if the FILE ITSELF has UTF-8
> characters in it. You DO NOT need it if it's simply processing UTF-8 data.

actually it's good practice to use <cfprocessingdirective> as the BOM is
optional for utf-8.