Copy link to clipboard
Copied
Hello, I'm trying to get my site up to speed with UTF-8 and it's really giving me a heck of a time doing so. I've checked and double-checked and everything that I can see appears to say the file is to be decoded as UTF-8.
Even so, smart quotes end up looking like this: “ and � and other characters like em dashes and accented characters share a similar fate.
I checked the page properties (right click→page info) , and it says the page is in UTF-8. I've pasted this bit of code into the page, and it outputs utf-8: <cfset theEncoding = getEncoding("URL")><cfoutput>#theEncoding#</cfoutput>
What more could possibly need to be done for this to be output by the browser as true UTF-8?
Copy link to clipboard
Copied
uh, the database & its driver?
btw unless your CFC are really outputting something teh cfcontent, etc isn't needed.
Copy link to clipboard
Copied
These are pages that aren't running through a database, though.
Here is another nifty little bit that I found out that might clear some things up a bit and get to the source of the problem.
If I put <cfprocessingdirective pageEncoding="utf-8"> at the top of my CFM pages, everything works out. The special characters show up fine. This feels like a pretty big kludge, though. I don't want to have to put this at the top of every single one of my pages. I'm told the server should be able to handle it all without relying on a page-by-page encoding declaration.
So, why would application.cfc not be rendering the page corretly where cfprocessingdirective does?
Copy link to clipboard
Copied
actually using cfprocessingdirective is considered a good practice. if you're
not using this tag & you're getting mojibake, then for sure your text isn't
properly/fully encoded as UTF-8. either as plain text or from your db.
or perhaps the server's default encoding (UTF-8) been changed?
if some of your text is being copied from word (the smart quotes say the answer
to this is probably "yes") then a browser often can't figure out the page's
encoding on it's own (latin-1 text with a sprinkling of word's "funny" chars
which could be unicode). including the cfprocessingdirective tag forces cf to
use that encoding. this is a compile time thing.
you can gain the same thing by using a BOM but for utf-8 it's entirely optional
(and pretty much useless for it's intended purpose, by definition utf-8 only has
the one order). and some editors don't like it (for instance eclipse from it's
java roots i guess).