Skip to main content
K.Daube
Community Expert
Community Expert
December 14, 2020
解決済み

XML and UNICODE - a quarrelling couple

  • December 14, 2020
  • 返信数 1.
  • 509 ビュー

Obviously xml has nothing to do with x-mas...

This is my input file (UTF-8 with BOM):

 

 

 

<?xml version="1.0" encoding="UTF-8"?>
<saves>
<!-- is Does the prolog help for Unicode support? -->
<item>
<!-- what happens to this & that comment? -->
<name>Wäßrige lösung</name>
<info>Wenig substanz wird in viel wasser gelöst.</info>
...

 

 

 

This is my simple script:

 

 

 

// WriteXmlData-b.jsx
// Read UTF data and write it back
#target framemaker

main ();

function main () {
var oXmlSettings, xmlData, sXmlFile = "WriteXml.xml"; // file exists in same dir as this script;

$.bp(true);
xmlData = GetXMLdata (sXmlFile);
WriteXMLdata (xmlData, sXmlFile);
} // --- End main --------------------------------

function WriteXMLdata (xmlData, sXmlFile) { // =============================================
var fXmlFile;
fXmlFile = new File($.fileName.replace (/[^\\\/]+$/i , sXmlFile));

try {
fXmlFile.open("w");
fXmlFile.write(xmlData);
fXmlFile.close();
} catch (e) {
alert("" + e.message + "\nThere are problems writing the XML file!");
}
return true;
} // --- End WriteXML ----------------------------

function GetXMLdata (sXmlFile) { // =======================================================
var fXmlFile, xData;

if ((sXmlFile == null) || (sXmlFile == undefined)) {
return false;
} else {
fXmlFile = new File($.fileName.replace (/[^\\\/]+$/i , sXmlFile)); // file in script folder
}
if (fXmlFile.exists === false) {
alert ("XML file «" + sXmlFile + "» not found.");
return false;
}
fXmlFile.open("r");
try {
xData = new XML(fXmlFile.read());
fXmlFile.close();
return xData;
} catch (e) {
alert ("Read error on XML file «" + sXmlFile + "». Error:\n "+ e); fXmlFile.close();
return false;
}
} //--- end getXMLdata -----------------------------

 

 

 

When running this program step by step I see the internal presentation


After writing I see in the file (it is still UTF-8 with BOM):

  1. Where is the prologue?
  2. Why do I have Chinese characters? The strange characters actually are u4000, u07C0 and u6000.
  3. What is UNICODE good for XML if it is necessary to use entities for all but the ASCII characters?
このトピックへの返信は締め切られました。
解決に役立った回答 Klaus Göbel

Hello Klaus my friend,

if you save a file like this

fXmlFile.open("w");
fXmlFile.write(xmlData);

... then you create a new file that is saved as UTF-8.(hopefully but not sure)

 

But if you want to save a file with UTF-8 AND BOM, you have to do this:

    var BOM = "\uFEFF";
    fXmlFile.open('w');
    fXmlFile.encoding = "UTF-8";
    fXmlFile.write(BOM);
    fXmlFile.write(xmlData);

 

 

返信数 1

K.Daube
Community Expert
K.DaubeCommunity Expert作成者
Community Expert
December 14, 2020

This is absolutely strange:

I have modified the main script to use two files, one for input and one for output and also to explicitly keep the comments:

 

function main () { // ============================
var xmlData, sXmlFileIn = "WriteXml-saved.xml", sXmlFileOut = "WriteXml.xml";

$.bp(true);
oXmlSettings = XML.defaultSettings(); 
oXmlSettings.ignoreComments = false;  
XML.setSettings(oXmlSettings);        

xmlData = GetXMLdata (sXmlFileIn);
WriteXMLdata (xmlData, sXmlFileOut);
} // --- End main --------------------------------

 

Before running the script I checked the output file: it exists, is empty, is UTF-8 with BOM (and hence is 3 bytes long).

After the script run the file is Windows Code Page, but contains all items except the prolog:

 

<saves>
  <!-- is Does the prolog help for Unicode support? -->
  <item>
    <!-- what happens to this & that comment? -->
    <name>Wäßrige lösung</name>
    <info>Wenig substanz wird in viel wasser gelöst.</info>
...

 

  What the heck is going on here?

Klaus Göbel
Klaus Göbel解決!
Legend
December 14, 2020

Hello Klaus my friend,

if you save a file like this

fXmlFile.open("w");
fXmlFile.write(xmlData);

... then you create a new file that is saved as UTF-8.(hopefully but not sure)

 

But if you want to save a file with UTF-8 AND BOM, you have to do this:

    var BOM = "\uFEFF";
    fXmlFile.open('w');
    fXmlFile.encoding = "UTF-8";
    fXmlFile.write(BOM);
    fXmlFile.write(xmlData);

 

 

K.Daube
Community Expert
K.DaubeCommunity Expert作成者
Community Expert
December 14, 2020

Thanks Klaus,

I completely forgot that in the WriteXMLdata function I always create a new new file to write the full XML structure.

And now I get what I want, even if the input file does not have a prolog:

<saves>
  <!-- Does the prolog help for Unicode support? -->
  <item>
    <!-- what happens to this & that comment? -->
    <name>Wäßrige lösung</name>
    <info>Wenig substanz wird in viel wasser gelöst.</info>

Klaus, you are a great mind-saver!