Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Problem with UTF filenames on Mac OS X

New Here ,
Aug 14, 2013 Aug 14, 2013

Hello,

I found on really annoying problem. I save UTF encoding filename and load it with function getFiles() back, but the strings are unexpectedly NOT equal on OS X 10.8.

EnteredManually = new File("~/čas.txt");

PrepareFile = EnteredManually.open("w");
EnteredManually.encoding = "UTF-8";
EnteredManually.writeln("aaa");
EnteredManually.close();

GetFromFilename = File(Folder("~/").getFiles("*.txt"));

if (GetFromFilename.name == EnteredManually.name) { alert("Is equal!"); }
else { alert(GetFromFilename.name); alert(EnteredManually.name);  alert("Is NOT equal!\nBecause:\n"+decodeURI(GetFromFilename)+" != "+decodeURI(EnteredManually)+"\n"+encodeURI(GetFromFilename)+" != "+encodeURI(EnteredManually)); }

I'm sure that this work without problem on Win with Win paths, but not on Mac.

The problem might be in Normal Form Decomposed of filename on OS X (see http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames).

Is any workaround how to deal with this problem and make script platform really independent?

Thank you

-Frank

TOPICS
Scripting
4.0K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 14, 2013 Aug 14, 2013

The problem is that on Windows, č is encoded as a hex value, whereas on a Mac it's split into two characters. The only really secure way to avoid these problems is not to use accented characters in file and folder names. And avoid spaces, too: there are many ftp servers around that don't accept spaces in file names. Use underscores instead, for instance.

Peter

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 14, 2013 Aug 14, 2013

I thought that ExtendScript as high level programming language solve this low-level things instead of me. Stupid idea

So I have to redesign whole file management in my project.

It can by solved only:

1) Save the filename directly into created file (eg.: filename.open("w"); file.writeln(filename.name)) and read it back.

2) Reencoded filename to Base64 string - really strange but could help.

3) Don't use accented characters – not acceptable for me in these days of 21. century.

Or any other idea?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 14, 2013 Aug 14, 2013

The question is what operating systems find acceptable. We all agree that accented characters should be usable on all platforms. But so long as that's not the case, you'd better not use them.

Peter

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 14, 2013 Aug 14, 2013

As I wrote at first: My code works on Windows fine because both strings (in JS and get filename) is encoded as UTF (Normal Form Composed) eg.: „č“ is one UTF8 character (2 bytes).

On Mac you get filename in UTF Normal Form Decomposed is „č“ as c + ˇ (accent).

Yes, basic problem is in really old-fashioned file system HFS+.

But Mac OS X API has since 2003 methods for convert Decomposed to Composed form a vice versa: http://developer.apple.com/library/mac/qa/qa1235/_index.html

So why Adobe can't use it in their SW, I don't understand

It is the similar problem as Adobe CS you cannot install on case-sensitive filesystem (new feature in Mac OS X 10.4 ).

I'm sorry, I'm a little bit angry because I spend hours of debuging this stupid problem.

Over all thank you for your time and patience.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 15, 2013 Aug 15, 2013

@erbenfr – see also the function cleanUmlaut() by Martin Fischer  back in 2006 at:

http://www.hilfdirselbst.ch/foren/Skript-Export_der_Dateinamen_von_verkn%FCpften_Bildern_mit_Umlaute...

(Discussion in German)

You could easily expand that function like Johannes Puff did in 2012:

http://www.hilfdirselbst.ch/gforum/gforum.cgi?post=494790#494790

Uwe

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 15, 2013 Aug 15, 2013

Thank you.

At the end I deal with this problem by another way – using indexes/hashes instead of filenames and save filenames into file as text.

For another project I consider your proposals or this https://github.com/walling/unorm - realy robust solution

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Aug 17, 2013 Aug 17, 2013
LATEST

Hi,

I think I have the same problem and i spend a lot of time with this bugs…

I have 1 file name from an user dialog box who isn't equal the same file from an indesign link:(

Could you give me an exemple of your code?

I would appreciate your help because I can't fix this problem.

Thanks!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines