evaluateXPathExpression vs XML-rules: which is better?

Report · May 07, 2015

Dear forum,

A couple of days ago I wrote a script which finds xml-elements using an xPath expression and applies a paragraph and/or a character style. I know that I can achieve this in two ways: using evaluateXPathExpression or XML-rules. No wonder that a question arose in my head: which of them is more preferable? evaluateXPathExpression seems to be more elegant: less wordy, doesn’t depend on the glue.jsx file, but it fails in some situations. I did some tests a year or two ago; don’t remember all the details, but both methods worked for xml-structures without namespaces. With a namespace, a script based on an xml-rule worked always, but a script which used evaluateXPathExpression didn’t.

Let me illustrate with an example what I mean.

I made two amost identical scripts:

Version 1.0 uses the xml-rule approach

Version 1.1 uses evaluateXPathExpression

(I don't post the code on the forum because it's lengthy).

Here’s a sample document for testing (InDesign CC 2014, about 3 Mb).

I am going to find <p> in <caption> in <fig> and apply paragraph style “Body”.

Version 1.0 works at the first try – one element is found as expected.

Version 1.1 says: “No XML-elements have been found”. However, if I remove this stuff: the namespaces and attributes, it works:

Before

After

Why does this happen?

Yet another question: in the scripting guide they wrote: “InDesign’s XML rules support a limited subset of the XPath 1.0 specification.” Here's an extract from it — the “XPath limitations” section.

Does it mean that evaluateXPathExpression provides more possibilities: meaning that I can use ALL the expressions found, say, here without any limitations?

Thank you in advance.

Kasyan

Report · May 08, 2015

XML rules were available earlier, first in CS3. In early builds of CS4 evaluateXPathExpression had a memory leak bug, causing the InDesign Servers to crash in the long run, but I could not reproduce that in a later CS4 release - probably fixed. Speed-wise the calls were then roughly the same with our very limited queries, initially evaluateXPathExpression was slightly faster.

XMLRules has those advanced features - it can work on multiple XPaths in parallel, and your script can skip whole subtrees. I've never used them, instead I worked with a very stripped down procedure that I wrote before the glue code was available - nobody forces you to use the provided glue code. Anyway that project where I did most XML scripting was never migrated beyond CS4 so I don't have much experience with newer versions.

I hope that some day Adobe adds full support for namespaces, until then I would strongly recommend to strip them out with an XSLT on import. There are ways to work with XPaths even with namespaces, but I have no experiences with them in regular InDesign. Could you please share your XPath expressions for the further discussion, maybe we get it going?

Btw, I do know that ExtendScript's XML objects have a different XPath() implementation and better namespace support, but that area has other problems e.g. size limitations and performance problems when you pass large XML objects to functions.

Better be also careful with XSL, the last time I looked Adobe was using an XSL processor "Sablotron" that in PHP had been replaced with libxsl back in 2003 for good reasons. On the other hand I also remember seeing traces of AXE. In a more recent plugin project where I really need namespaces I use an own XML subsystem for storage and libxsl for XSL. It is just more fun with Exslt.

Dirk

Report · May 08, 2015

One addition: it might be sufficient if you explicitly declare the xmlns:xml as used by your xml:lang attribute, that's an already known problem ...

Re: evaluateXPathExpression

Report · May 08, 2015

Revisiting some old messages, there is even a better way: xmlRuleProcessors.add() has a second, optional argument.

var root = app.activeDocument.xmlElements.item(0);
var xpath = "//*";
try {
    var proc  = app.xmlRuleProcessors.add([xpath],[["xml","http://www.w3.org/XML/1998/namespace"]]);
    var match = proc.startProcessingRuleSet(root);
    var node = null;
    while( match!=undefined ) {
        node = match.element;
        $.writeln(node.markupTag.name+": "+node.contents);
        match = proc.findNextMatch();
    }
} finally {
    proc.endProcessingRuleSet();
    proc.remove();
}

Report · May 08, 2015

Thank you for your reply, Dirk.

I’m writing this script for a client who gets xml-files from his customer, imports them into InDesign and makes books. In other words, I have no idea where they get the xml-structure from and how they are going to use it after the InDesign document is complete. All I know is the script shouldn’t disarrange it: only find elements according to certain criteria and apply formatting to their associated text.

It is obvious that I can’t post the sample document that my client gave me for testing, but here’s a few screenshots to give you idea about the xml-structure.

It’s quite complex: namespaces can appear not only in the root element, but also within nested elements. So I can’t know beforehand which namespaces will be used in the next book and handle them by script.

So far, we tested it with simple expressions:

//fig/caption/p

//boxed-text/caption/p

//boxed-text/list/list-item/p

But in the future we are going to use more complex: e.g. with attributes having certain values.

Your samples are very useful to me. I’ve never used xmlRuleProcessor before and didn’t know how to handle namespaces.

Thank you again.

Kasyan

Report · Jan 02, 2016

Hi all,

After a few months I decided to answer my own question in hope it may come in handy to the future generations.

Here’s a screenshot of the script’s dialog box I’m talking about:

I added a couple of radio buttons so the user could choose either xml-rules or evaluateXPathExpression option.

After some testing I came to conclusion that the latter is much better because it allows overcoming xPath limitations (not all, of course).

Here’s a quote from the XML rules chapter (page 200-201):

Due to the one-pass nature of this implementation, the following XPath expressions are specifically

excluded:

No ancestor or preceding-sibling axes, including .., ancestor::, preceding-sibling::.

No path specifications in predicates; for example, foo[bar/c].

No last() function.

No text() function or text comparisons; however, you can use InDesign scripting to examine the text

content of an XML element matched by a given XML rule.

No compound Boolean predicates; for example, foo[@bar=font or @c=size].

No relational predicates; for example, foo[@bar < font or @c > 3].

No relative paths; for example, doc/chapter.

Frankly speaking, I’m not an xPath-man and don’t quite understand what it means. But my client wanted the script to work with the last() function.

When I tried to use it with xml-rules, the script wrote the following error to console: “Adobe InDesign cannot process^1XPath expression '^2', line: 43”.

But with evaluateXPathExpression it worked as expected.

Since I know almost nothing about xPath, I used the information found on these two pages – XML and XPath and XPath Syntax – because it’s given in an easy-to-understand manner for me. (I know some coding gurus here on the scripting forum will be angry at me for mentioning the http://www.w3schools.com but please have mercy on me – a mere mortal retoucher.) I created a couple of documents for testing -- copied the xml-structure and imported it to InDesign – and tested them against the path expressions posted as examples on the pages. Here’s an archive with the InDesign documents (CC 2014) – their IDML versions are also included – and the scripts I used for testing.

Here are the examples that work with evaluateXPathExpression, but don’t work with xml-rules producing an error:

/bookstore/book[last()] Selects the last book element that is the child of the bookstore element

/bookstore/book[last()-1] Selects the last but one book element that is the child of the bookstore element

/bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element

/bookstore/book[price>35.00] Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00

/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00

//book/title | //book/price Selects all the title AND price elements of all book elements

//title | //price Selects all the title AND price elements in the document

/bookstore/book/title | //price Selects all the title elements of the book element of the bookstore element AND all the price elements in the document

. (a dot) Selects the current node

This one doesn’t work with xml-rules silently producing no error:

bookstore//book Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element

This doesn’t work with both -- xml-rules (producing an error) and evaluateXPathExpression (no error):

//@lang Selects all attributes that are named lang

The following doesn’t work with both silently:

bookstore/book Selects all book elements that are children of bookstore

bookstore//book Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element

Here are the simple scripts I used for testing (I just copied-pasted xPath expressions into the relative lines):

evaluateXPathExpression

var scriptName = "evaluateXPathExpression test",
doc;
Main();
//===================================== FUNCTIONS  ======================================
function Main() {
    var xmlElement;
  
    if (app.documents.length == 0) ErrorExit("Please open a document and try again.", true);
    doc = app.activeDocument;
  
    var xmlRoot = doc.xmlElements[0];
    var xmlElements = xmlRoot.evaluateXPathExpression("/bookstore/book[last()]");

    if (xmlElements.length > 0) {
        $.writeln("==================\r");
    }
    else {
        $.writeln("No elements were found");
    }

    for (var i = 0; i < xmlElements.length; i++) {
        xmlElement = xmlElements;
        $.writeln((i + 1) + " - " + xmlElement.contents.substring(0, 30).replace(/\s+/g, " ") + "...");
    }
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function ErrorExit(error, icon) {
    alert(error, scriptName, icon);
    exit();
}

XML-rules

var scriptName = "XML-rules test",
doc,
glueFile = new File(app.filePath + "/Scripts/xml rules/glue code.jsx"),
count = 0;

PreCheck();
app.doScript(glueFile, ScriptLanguage.JAVASCRIPT);
Main();

//===================================== FUNCTIONS  ======================================
function Main() {
    $.writeln("==================\r");
    var ruleSet = [new ProcessTag];
    with (doc) {
        var elements = xmlElements;
        try {
            __processRuleSet(elements[0], ruleSet);
        }
        catch(err) {
            $.writeln(err.message + ", line: " + err.line);
        }
    }
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function ProcessTag() {
    this.name = "ProcessTag";
    this.xpath = "/bookstore/book[last()]";
    this.apply = function(xmlElement, ruleProcessor) {
        ProcessAttributes(xmlElement);
        return true;
    }
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function ProcessAttributes(xmlElement) {
    count++;
    $.writeln(count + " - " + xmlElement.contents.substring(0, 30).replace(/\s+/g, " ") + "...");
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function PreCheck() {
    if (app.documents.length == 0) ErrorExit("Please open a document and try again.", true);
    doc = app.activeDocument;
    if (!app.activeDocument.saved) ErrorExit("The current document has not been saved since it was created. Please save the document and try again.", true);
    if (!glueFile.exists) ErrorExit("\"glue code.jsx\" should be located in the \"Scripts > xml rules\" folder for this script to work.", true);
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function ErrorExit(error, icon) {
    alert(error, scriptName, icon);
    exit();
}

Now a very important note: when I tested xPath expressions using evaluateXPathExpression, it worked well with my simple test files, but when my client gave me his “real” document with a complex xml-structure, it stopped working for some reason.

A couple of years ago (or more) I found out that it was somehow associated with namespaces: no namespaces – no problem.

Now, after a more careful investigation I’ve discovered that it depends on the attributes in all the parent xml-elements starting from the element we’re looking for and up to the root. The problem occurs because of the elements whose names contain colons.

For example, we’re looking for //sec/p[last()] (the last p element in every sec element). In the screenshot below, it’s marked in green.

The elements marked in red prevent the script from working properly.

My idea was to temporarily replace colons, say, with underscores -- or any other valid character, in case underscores are used in attribute names – so I came up with the following script:

/* Copyright 2015, Kasyan Servetsky
December 21, 2015
Written by Kasyan Servetsky
http://www.kasyan.ho.com.ua
e-mail: askoldich@yahoo.com */
//======================================================================================
var scriptName = "Prepare xml-attributes",
glueFile = new File(app.filePath + "/Scripts/xml rules/glue code.jsx"),
txt,
count = 0;

PreCheck();
app.doScript(glueFile, ScriptLanguage.JAVASCRIPT);
CreateDialog();

//===================================== FUNCTIONS  ======================================
function Main() {
    var w = new Window("window", scriptName);
    txt = w.add("statictext", undefined,  "Processing xml-attribute #0000. Please be patient. It may take a while.");
    w.show();
  
    var ruleSet = [new ProcessTag];
    with (doc) {
        var elements = xmlElements;
        __processRuleSet(elements[0], ruleSet);
    }
  
    w.close();
  
    var report = count + " item" + ((count == 1) ? " was" : "s were") + " processed.";
    alert("Finished. " + report, scriptName);
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function ProcessTag() {
    this.name = "ProcessTag";
    this.xpath = "//*";
    this.apply = function(xmlElement, ruleProcessor) {
        ProcessAttributes(xmlElement);
        return true;
    }
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function ProcessAttributes(xmlElement) {
    var xmlAttribute;
  
    if (xmlElement.xmlAttributes.length > 0) {
        for (var i = xmlElement.xmlAttributes.length - 1; i >= 0; i--) {
            xmlAttribute = xmlElement.xmlAttributes;
            if (set.rbSel == 0) {
                if (xmlAttribute.name.match(/:/) != null) {
                    xmlAttribute.name = xmlAttribute.name.replace(/:/g, "_");
                }
            }
            else {
                if (xmlAttribute.name.match(/_/) != null) {
                    xmlAttribute.name = xmlAttribute.name.replace(/_/g, ":");
                }
            }
        }
        count++;
        txt.text = "Processing xml-attribute #" + count + ". Please be patient. It may take a while.";
    }
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function CreateDialog() {
    GetDialogSettings();
    var w = new Window("dialog", scriptName);

    w.p = w.add("panel", undefined, "");
    w.p.orientation = "column";
    w.p.alignChildren = "left";
  
    w.p.rb0 = w.p.add("radiobutton", undefined, "Prepare xml-attributes");
    w.p.rb1 = w.p.add("radiobutton", undefined, "Restore original xml-attributes");
  
    if (set.rbSel == 0) {
        w.p.rb0.value = true;
    }
    else if (set.rbSel == 1) {
        w.p.rb1.value = true;
    }

    w.buttons = w.add("group");
    w.buttons.orientation = "row"; 
    w.buttons.alignment = "center";
    w.buttons.ok = w.buttons.add("button", undefined, "OK", {name:"ok" });
    w.buttons.cancel = w.buttons.add("button", undefined, "Cancel", {name:"cancel"});
    var showDialog = w.show();
  
    if (showDialog == 1) {
        if (w.p.rb0.value == true) {
            set.rbSel = 0;
        }
        else if (w.p.rb1.value == true) {
            set.rbSel = 1;
        }
      
        app.insertLabel("Kas_" + scriptName, set.toSource());
        Main();
    }
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function GetDialogSettings() {
    set = eval(app.extractLabel("Kas_" + scriptName));

    if (set == undefined) {
        set = { rbSel: 0 };
    }
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function PreCheck() {
    if (app.documents.length == 0) ErrorExit("Please open a document and try again.", true);
    doc = app.activeDocument;
    if (!app.activeDocument.saved) ErrorExit("The current document has not been saved since it was created. Please save the document and try again.", true);
    if (!glueFile.exists) ErrorExit("\"glue code.jsx\" should be located in the \"Scripts > xml rules\" folder for this script to work.", true);
}
//--------------------------------------------------------------------------------------------------------------------------------------------------------
function ErrorExit(error, icon) {
    alert(error, scriptName, icon);
    exit();
}

You can test it against the xPath test-2-with namespaces.indd which is included in the archive.

First I use the 1-st button -- "Prepare xml-attributes" -- to replace colons with underscores:

Before

Progress bar

After

Final report

Finally, after making changes to the document using evaluateXPathExpression -- applying styles, clearing overrides, etc. -- I restore colons with the 2-nd button so the original xml-structure remains intact.

In my opinion, using evaluateXPathExpression provides huge possibilities. My client told me that the script saved him hundreds of working hours. I'd be glad to get feedback on this topic: new problems and their solutions, new xPath examples with comments -- what they should do, if they work, or not ... why they don't work, etc.

I can't promise I would reply to every question because I have to do my regular work and care about my family, but I'll read it with great interest and look into it if time permits.

P.S. Sorry for the lengthy post -- the forum software crashed a three four times as I wrote it.

Report · Jan 02, 2016

Hi Kayan,

the lengthy post will be appreciated by the few of us that haven't given up on InDesign XML. And I've also reduced my participation in the Adobe forums due to the lousy forum software.

Regarding one of your problems - to find the lang attribute:

Apparently XML rules can only return elements, so when you search for attributes InDesign internally chokes, and the debug build visits the debugger to report several internal errors.

If you use the following xpaths with the xml rule script that I posted above, you'll get the elements.

var xpath = "//*[@lang]";

var xpath = "//*[@xml:lang]";

Other, more complicated xpaths failed.

//~ var xpath = "//*[@*[local-name()='lang']]";

//~ var xpath = "//@*[local-name()='lang']";

On XML namespaces:

http://www.w3.org/TR/REC-xml-names/#ns-decl

An attribute such as xmlns:foo="http://example.com/abc" with the prefix "xmlns" declares a namespace binding for its local name "foo", so that the local name can be used as prefix further on. For real XML processing, the relevant part of the namespace is the URL, not the prefix! Any other namespace binding xmlns:bar="http://example.com/abc" leading to the same namespace URL should be treated as equivalent, while a nested element can also override the same prefix with a different URL xmlns:foo="http://example.com/not-abc" suggesting a different meaning!

So your approach to substitute the colon may work for now, but it is not formally correct. For example the XML based XMP metadata has provisions where you can suggest a prefix but it may choose a different one on collisions.

Cheers,

Dirk

Report · Jan 03, 2016

Hi Dirk,

Thank you for your explanation. It's very helpful to me.

Yes, with the latest project my solution -- or rather workaround -- worked, but in the future I'm expecting more xml-based documents with different structures; will see if it works on them.

Currently the main problem for me is that I don't see "the big picture". The client I write scripts for lives on another side of the globe; he, in turn, has a customer who gives him the work. I do my small part of the job (scripts) but have no idea where it starts (where they get the xml-structure from) and where it ends (where they finally put it). I wish I could do such a work completely -- from beginning to the end, but don't think I can find such a job in my country.

Regards,

Kasyan

evaluateXPathExpression vs XML-rules: which is better?

1 Correct answer