UTF-8 page encoding is not coming out true

Report · Aug 27, 2010

Hello, I'm trying to get my site up to speed with UTF-8 and it's really giving me a heck of a time doing so. I've checked and double-checked and everything that I can see appears to say the file is to be decoded as UTF-8.

Inside my text editor, CFEclipse, the file encoding is set to UTF-8.
In application.cfc, I have
<cfcomponent output="false">
    
    
    <cfprocessingdirective pageencoding="utf-8">
    <cfcontent type="text/html; charset=UTF-8">
    <cfset SetEncoding("URL", "UTF-8")>
    <cfset SetEncoding("Form", "UTF-8")>
    ...
</cfcomponent>
In each page it uses valid HTML5, like this:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">

Even so, smart quotes end up looking like this: â€œ and â€� and other characters like em dashes and accented characters share a similar fate.

I checked the page properties (right click→page info) , and it says the page is in UTF-8. I've pasted this bit of code into the page, and it outputs utf-8: <cfset theEncoding = getEncoding("URL")><cfoutput>#theEncoding#</cfoutput>

What more could possibly need to be done for this to be output by the browser as true UTF-8?

Report · Aug 27, 2010

uh, the database & its driver?

btw unless your CFC are really outputting something teh cfcontent, etc isn't needed.

Report · Aug 30, 2010

These are pages that aren't running through a database, though.

Here is another nifty little bit that I found out that might clear some things up a bit and get to the source of the problem.

If I put <cfprocessingdirective pageEncoding="utf-8"> at the top of my CFM pages, everything works out. The special characters show up fine. This feels like a pretty big kludge, though. I don't want to have to put this at the top of every single one of my pages. I'm told the server should be able to handle it all without relying on a page-by-page encoding declaration.

So, why would application.cfc not be rendering the page corretly where cfprocessingdirective does?

Report · Aug 30, 2010

actually using cfprocessingdirective is considered a good practice. if you're

not using this tag & you're getting mojibake, then for sure your text isn't

properly/fully encoded as UTF-8. either as plain text or from your db.

or perhaps the server's default encoding (UTF-8) been changed?

if some of your text is being copied from word (the smart quotes say the answer

to this is probably "yes") then a browser often can't figure out the page's

encoding on it's own (latin-1 text with a sprinkling of word's "funny" chars

which could be unicode). including the cfprocessingdirective tag forces cf to

use that encoding. this is a compile time thing.

you can gain the same thing by using a BOM but for utf-8 it's entirely optional

(and pretty much useless for it's intended purpose, by definition utf-8 only has

the one order). and some editors don't like it (for instance eclipse from it's

java roots i guess).