[xml] ascii 172 being added

Report · Nov 08, 2012

Hello,

I started off by pulling text from a textFrame which had "© LICENSING". I stored this string to an xml object (variable name of 'x'), and then wrote it to an xml file like so:

filePath.encoding = 'UTF8';

filePath.open('w');

filePath.write(x.toXMLString());

filePath.close();

Now when I look in the xml file, it has the actual copyright symbol character instead of & # 1 6 9 ; (I added spaces so hopefully this won't be converted in the message).

Seems that if my xml file contains a copyright symbol character, then if I pull that data and store that to an XML object in InDesign, then it adds ASCII character 172 before that copyright symbol. If I then put that in a textFrame, you can't see that character (ASCII 172) but if you use the arrow keys you can tell that it's there.

Any ideas on why ASCII 172 is being added before the copyright symbol? By the way, I'm using InDesign CS5.5 on OS X Lion.

Here's an example of the xml data that was saved to the xml file:

<?xml version="1.0" encoding="UTF-8"?>

<copyright id="10-1" w="69.16" h="32.1990625000001" x="-57.2081831359861" y="243.181297317146">© LICENSING</copyright>

</cbAttrs>

And here's the code I'm using to pull the data in from the xml file:

var xmlPath = File('/Path/to/xml');

if (xmlPath.open('r')) {

var xmlText = xmlPath.read();

xmlPath.close();

var x = XML(xmlText);

}

var a = x.copyright.toString();

$.writeln(a[0].charCodeAt());

$.writeln(a);

Which gives me:

172

¬© LICENSING

Thanks for looking at my questions.

Report · Nov 09, 2012

This is the problem:

filePath.encoding = 'UTF8';

.. since © is out of range for 7-bit ASCII, it gets tagged as a two-byte UTF8 character (see http://www.fileformat.info/info/unicode/char/a9/index.htm).

It's for the moment a mystery to me why you would get the not-sign 172, instead of the correct UTF8 code 0xC2 (194); perhaps the binary value got converted somewhere else in your workflow or during copy-and-paste or something.

So at least your input XML gets created correctly. What remains is why importing it again fails to recognize the UTF8 specification in the very first line. You could try if adding the encoding line to your read routine solves it.

Report · Nov 09, 2012

Hi [Jongware],

Thank you very much for your reply. And I think your idea of adding the encoding to the reading of the file may be the solution. I added encoding like so:

if (xmlPath.open('r')) {

xmlPath.encoding = 'UTF8';

var xmlText = xmlPath.read();

xmlPath.close();

var x = XML(xmlText);

}

And ran my test again, which didn't add ASCII 172. I will do some more testing, but I'm thinking this should do the trick.

Thanks again for taking the time to help, as well as teach me some things I wasn't aware of.

[xml] ascii 172 being added

1 Correct answer