encoding and exporting text (JS CS3 MAC and PC)

Report · Feb 29, 2012

Hi -- I am trying to export text from InDesign as an ASCII file. But I am ending up with a zero K file.

var myX = new File(myTarget);

myX.encoding = "ASCII";

myX.lineFeed = "unix";

myX.open("w");

myX.write(myString);

myX.close();

I have success with this:

var myX = new File(myTarget);

myX.encoding = "UTF-8";

myX.lineFeed = "unix";

myX.open("w");

myX.write("\uFEFF" + myString);

myX.close();

But the person on the other end of the export says I need to convert UTF-8 to ASCII encoding.

Any help would be greatly appreciated.

Report · Mar 01, 2012

Did "the person on the other end of the export" explain why you should ASCII? Try both encodings and see what the difference is.

Peter

Report · Mar 01, 2012

There are some utf-8 characters they can't handle.

Report · Mar 01, 2012

Try this: type some random text in a text frame and run your ASCII script to export it. You will get the text. Now insert any 'special' character -- any at all. Accented characters, curly quotes, en dash, em space, Hebrew, Greek, a footnote, an Insert Page Number Here. Now you no longer can export to "ascii" because otherwise the file would contain characters that are not available in the ASCII format.

jmw107 wrote:
... the person on the other end of the export says I need to convert UTF-8 to ASCII encoding.

You cannot easily "convert" any random UTF-8 text to ASCII. One way would be to throw away all non-ASCII characters, another way is to intelligently replace them with basic characters -- e acute to e, double left curly quote to " and so on. But it's not something Javascript "does" for you.

Inside the InDesign UI you can search for non-ASCII character with GREP: look for

[^[:ascii:]]

and then you can replace them with something appropriate. Unfortunately, Javascript's GREP is a slightly different version so you cannot use this inside a script to clean up your text.

Report · Mar 01, 2012

Hi -- Thanks for the explanation.

My first thought is I could build a translation resource doc for most of the non-ASCII characters, but I have run into a more significant issue: If the story has notes, I cannot save it as ASCII (zero K). I tried hiding the notes, but that has no affect. Removing the notes works, but obviously I do not want to do that. Can you think of any way around this?

As an alternative, can anyone suggest a command line tool for converting a utf-8 to ANSII. One that would turn any non ANSII characters into ? would be acceptable. I tried iconv, but it fails to convert the files. I know this question may not be appropriate for an InDesign forum, but I suspect many of the developers here use tools like this to overcome these kind of problems.

Thanks again

Report · Mar 01, 2012

ANSI =/= ASCII

If you have your string in Javascript, all you have to do is kick out all characters with a Unicode greater than 255. For "true" ASCII you also have to ditch all 8-bit characters, i.e. everything above code 127. It's straightforward to replace them with a '?'; you could make it nicer by replacing them with "acceptable" replacements for the most common characters, such as a straight double quote instead of double curlies, remove accents from Latin-1 characters, replacing en and em dashes with hyphens, etc. Even replacing the ellipsis with "..." wouldn't do no harm. But you're going to have to make some compromises.

Is your client really okay with you sending "processed" text? Depending on what goes in, the output may or may not resemble anything coherent anymore. Imagine a phrase in Greek.

>There are some utf-8 characters they cannot handle.

Wot nonsense. "Some"!? It's All or Nothing, I'd say. If they send you a list of those they can, or cannot, you could make a better translation.

>.. notes ..

Wait, you have notes -- as in "footnotes"? That concept simply does not exist in ASCII, ANSI, or for that matter, in UTF-8.

Report · Mar 01, 2012

Sorry, typo on my part -- I meant ASCII not ANSI

What I mean by notes are the hidden text you can put in stories. They cause my ASCII exports to show up as zerok. If I export these stories as utf-8 they show up as a white square in notepad. If you look at them in a hex editor they read ef bb bf.

Report · Mar 01, 2012

You mean, in UTF-8 they look like ef bf bb, right? That is InDesign's Placeholder marker (see also http://www.fileformat.info/info/unicode/char/fffd/index.htm). It's useless to include this in your export because (a) InDesign uses this code for a lot of different functions, and (b) there is nothing "associated" with it after exporting. You can safely remove them from your string before you output it as text, you don't have to remove them from your InDesign document.

Report · Mar 01, 2012

Oh wait, that must be the UTF-8 code your client was having difficulties with. Try your very first UTF-8 export again, but this time remove all of these codes before writing your string to the file. It's simple, add this line before writing:

myString = myString.replace (/\uFFFD/g, '');

Well I say "simple" but when I learned this particular trick it was a moment of "why didn't anyone told me this five years ago!?".

Report · Mar 01, 2012

Thanks ... but I inserted that line and I am still getting that code. Could I be missing something?

Report · Mar 02, 2012

My bad! I knew U+FFFD could occasionally pop up in text copied straight out of InDesign, so I checked its UTF8 encoding and that is "EF BF BB". That's why I thought you got it wrong, and advised to remove U+FFFD.

But ... you got it right after all. The one you said it was, "EF BB BF", also translates to yet another code that ID uses as 'placeholder': U+FEFF, and since that's also a special code in Unicode your client is having problems with it.

Change the replacement line to this one (this time around, I made sure to run your script and check the result before posting ...).

myString = myString.replace (/\uFEFF/g, "");

Report · Mar 05, 2012

Thanks ... it worked great.

I really appreciate the help.

encoding and exporting text (JS CS3 MAC and PC)

1 Correct answer