Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

encoding and exporting text (JS CS3 MAC and PC)

New Here ,
Feb 29, 2012 Feb 29, 2012

Hi -- I am trying to export text from InDesign as an ASCII file. But I am ending up with a zero K file.

var myX = new File(myTarget);

myX.encoding = "ASCII";

myX.lineFeed = "unix";

myX.open("w");

myX.write(myString);

myX.close();

I have success with this:

var myX = new File(myTarget);

myX.encoding = "UTF-8";

myX.lineFeed = "unix";

myX.open("w");

myX.write("\uFEFF" + myString);

myX.close();

But the person on the other end of the export says I need to convert UTF-8 to ASCII encoding.

Any help would be greatly appreciated.

TOPICS
Scripting
2.8K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Mar 02, 2012 Mar 02, 2012

My bad! I knew U+FFFD could occasionally pop up in text copied straight out of InDesign, so I checked its UTF8 encoding and that is "EF BF BB". That's why I thought you got it wrong, and advised to remove U+FFFD.

But ... you got it right after all. The one you said it was, "EF BB BF", also translates to yet another code that ID uses as 'placeholder': U+FEFF, and since that's also a special code in Unicode your client is having problems with it.

Change the replacement line to this one (this time ar

...
Translate
Community Expert ,
Mar 01, 2012 Mar 01, 2012

Did "the person on the other end of the export" explain why you should ASCII? Try both encodings and see what the difference is.

Peter

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 01, 2012 Mar 01, 2012

There are some utf-8 characters they can't handle.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 01, 2012 Mar 01, 2012

Try this: type some random text in a text frame and run your ASCII script to export it. You will get the text. Now insert any 'special' character -- any at all. Accented characters, curly quotes, en dash, em space, Hebrew, Greek, a footnote, an Insert Page Number Here. Now you no longer can  export to "ascii" because otherwise the file  would contain characters that are not available in the ASCII format.

jmw107 wrote:

...  the person on the other end of the export says I need to convert UTF-8 to ASCII encoding.

You cannot easily "convert" any random UTF-8 text to ASCII. One way would be to throw away all non-ASCII characters, another way is to intelligently replace them with basic characters -- e acute to e, double left curly quote to " and so on. But it's not something Javascript "does" for you.

Inside the InDesign UI you can search for non-ASCII character with GREP: look for

[^[:ascii:]]

and then you can replace them with something appropriate. Unfortunately, Javascript's GREP is a slightly different version so you cannot use this inside a script to clean up your text.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 01, 2012 Mar 01, 2012

Hi -- Thanks for the explanation.

My first thought is I could build a translation resource doc for most of the non-ASCII characters, but I have run into a more significant issue: If the story has notes, I cannot save it as ASCII (zero K). I tried hiding the notes, but that has no affect. Removing the notes works, but obviously I do not want to do that. Can you think of any way around this?

As an alternative, can anyone suggest a command line tool for converting a utf-8 to ANSII. One that would turn any non ANSII characters into ? would be acceptable. I tried iconv, but it fails to convert the files. I know this question may not be appropriate for an InDesign forum, but I suspect many of the developers here use tools like this to overcome these kind of problems.

Thanks again

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 01, 2012 Mar 01, 2012

ANSI =/= ASCII

If you have your string in Javascript, all you have to do is kick out all characters with a Unicode greater than 255. For "true" ASCII you also have to ditch all 8-bit characters, i.e. everything above code 127. It's straightforward to replace them with a '?'; you could make it nicer by replacing them with "acceptable" replacements for the most common characters, such as a straight double quote instead of double curlies, remove accents from Latin-1 characters, replacing en and em dashes with hyphens, etc. Even replacing the ellipsis with "..." wouldn't do no harm. But you're going to have to make some compromises.

Is your client really okay with you sending "processed" text? Depending on what goes in, the output may or may not resemble anything coherent anymore. Imagine a phrase in Greek.

>There are some utf-8 characters they cannot handle.

Wot nonsense. "Some"!? It's All or Nothing, I'd say. If they send you a list of those they can, or cannot, you could make a better translation.

>..  notes ..

Wait, you have notes -- as in "footnotes"? That concept simply does not exist in ASCII, ANSI, or for that matter, in UTF-8.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 01, 2012 Mar 01, 2012

Sorry, typo on my part -- I meant ASCII not ANSI

What I mean by notes are the hidden text you can put in stories. They cause my ASCII exports to show up as zerok. If I export these stories as utf-8 they show up as a white square in notepad. If you look at them in a hex editor they read ef bb bf.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 01, 2012 Mar 01, 2012

You mean, in UTF-8 they look like ef bf bb, right? That is InDesign's Placeholder marker (see also http://www.fileformat.info/info/unicode/char/fffd/index.htm). It's useless to include this in your export because (a) InDesign uses this code for a lot of different functions, and (b) there is nothing "associated" with it after exporting. You can safely remove them from your string before you output it as text, you don't have to remove them from your InDesign document.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 01, 2012 Mar 01, 2012

Oh wait, that must be the UTF-8 code your client was having difficulties with. Try your very first UTF-8 export again, but this time remove all of these codes before writing your string to the file. It's simple, add this line before writing:

myString = myString.replace (/\uFFFD/g, '');

Well I say "simple" but when I learned this particular trick it was a moment of "why didn't anyone told me this five years ago!?".

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 01, 2012 Mar 01, 2012

Thanks ... but I inserted that line and I am still getting that code. Could I be missing something?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 02, 2012 Mar 02, 2012

My bad! I knew U+FFFD could occasionally pop up in text copied straight out of InDesign, so I checked its UTF8 encoding and that is "EF BF BB". That's why I thought you got it wrong, and advised to remove U+FFFD.

But ... you got it right after all. The one you said it was, "EF BB BF", also translates to yet another code that ID uses as 'placeholder': U+FEFF, and since that's also a special code in Unicode your client is having problems with it.

Change the replacement line to this one (this time around, I made sure to run your script and check the result before posting ...).

myString = myString.replace (/\uFEFF/g, "");

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 05, 2012 Mar 05, 2012
LATEST

Thanks ... it worked great.

I really appreciate the help.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines