Skip to main content
Inspiring
March 25, 2009
Question

cffile and UTF-8

  • March 25, 2009
  • 16 replies
  • 4720 views
Hello Community!

I have a program that uploads a file to a remote FTP server. I am using cffile to write the file there and it MUST be uploaded in UTF-8 format. Despite that, the file is being uploaded as ascii or ansi, anything except UTF-8.

This is my line of code:
<cffile action="write" file="#f_dir##f_name#" output="#dataHeader#" charset="utf-8">

charset="utf-8" is not working for me.

Does anybody else have the same problem? Any thoughts?

Thanks!

Ysais.
This topic has been closed for replies.

16 replies

Inspiring
March 31, 2009
Ok, I just sent them to you.
Inspiring
March 31, 2009
Paul,

You said here:

"don't use either "

I think that the party I am sending this file to is just opening the file in Notepad ++ and when he sees it says ANSI there he requests a different file. This file is supposed to be processed in their servers but I haven't got any output from the processing software just the feedback from this guy that administers the servers.

This turned out to be a big project in time terms for me.

Thanks a lot!
Inspiring
March 31, 2009
apocalipsis19 wrote:
> The file should just be UTF-8. That would solve my problem.

again, can i see a zipped up version before & after uploading?
Inspiring
March 31, 2009
Sure! How do I send it to you?
Inspiring
March 31, 2009
Thanks Paul!

The file should just be UTF-8. That would solve my problem.


I am just opening the file in those text editors to see the encoding of the file.
Inspiring
March 31, 2009
Mack wrote:
> I found this java bug that is related to the problem. It's about reading
> UTF-8 files with BOM but if it's not transparent on read I doubt it's
> tranparent on write:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

and sun marked that "bug" as "Closed, Will Not Fix". sun's not going to fix
something that it considers not "broken" (there are also "bugs" related to java
not compiling source with a BOM as well) or that will create backwards
compatibility problems--a BOM is optional for utf-8 (and pretty much useless in
utf-8 anyway) but required for utf-16 which java handles ok (if i remember rightly).

and just an FYI, sun usually gives i18n bugs short shrift. some locale resource
bugs (and i mean real bugs like stuff where the get currency/numeric formatting
dead wrong) have been around for >5 years.

Inspiring
March 30, 2009
apocalipsis19 wrote:
> Well,
>
> I have done further research on this issue and all of my code is correct. The
> problem is the underlying JVM. It does nor properly support adding the Byte
> Order Mark to a UTF-8 file. Some people suggest adding the file through Java
> code inside the cfscript tags.
>
> I will look into deeper into this and I continue to appreciate any ideas you
> guys give me!

I found this java bug that is related to the problem. It's about reading
UTF-8 files with BOM but if it's not transparent on read I doubt it's
tranparent on write:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

--
Mack
Inspiring
March 30, 2009
apocalipsis19 wrote:
> I have done further research on this issue and all of my code is correct. The
> problem is the underlying JVM. It does nor properly support adding the Byte
> Order Mark to a UTF-8 file. Some people suggest adding the file through Java

a BOM is *optional* for utf-8 by definition (and if you read the definition
you'll see why it's also pretty much un-needed). is the app on the other end
expecting a BOM?

> code inside the cfscript tags.

if your research is correct about the JVM & BOM writing (i think not, it's
optional so the app should handle writing it to a new file), then it's six of
one, half dozen of the other.


what is the app on the other end expecting *exactly*? can you put up the before
& after data (zipped up to preserve encoding)?
Inspiring
March 30, 2009
Paul,

The application on the other end is expecting a file UTF-8 encoded. What really troubled me at first is that when I opened the file with EditPlus it said that the file was UTF-8 but when I opened the file with Notepad ++ it said that it was ANSI. My charset attribute is set to UTF-8 in my cffile tags. The transferMode attribute in the cfftp tag is set to BINARY. I will continue submitting the file until I fix this problem.

Mack,

Thanks for the link, I am looking into that. I will post in here whatever happens for future reference or other fellows' reference.

If you guys come up with something else I will be more than happy to read about it.

Thanks!

Ysais.
Inspiring
March 30, 2009
Well,

I have done further research on this issue and all of my code is correct. The problem is the underlying JVM. It does nor properly support adding the Byte Order Mark to a UTF-8 file. Some people suggest adding the file through Java code inside the cfscript tags.

I will look into deeper into this and I continue to appreciate any ideas you guys give me!

Thanks!

Ysais.
Inspiring
March 30, 2009
apocalipsis19 wrote:
> My problem still persists.

I think you have only 2 steps CFFILE and CFFTP. I'd check after each
step if the file is *really* UTF-8 reducing the problem in half.

--
Mack
Inspiring
March 30, 2009
My problem still persists.