Skip to main content
Inspiring
November 4, 2025
Question

Extracting XML tags on a given page

  • November 4, 2025
  • 1 reply
  • 118 views

Hello there. 

Does someone know if it is possible to extract a text on a given page that is using specific XML Tag? 
Let's say that I have a business cards with name, phone number and an e-mail address. 
All of those elements may or may not be inside the same text box, but they are all tagged [with the colored brackets] with unique tag and are on a same page. 
The InDesign file may have multiple pages and I would like to target individual pages separately. 

Is it possible to extract the text itself if I know the name of the tag?

 

The purpose of this is that I can create a template for data merge and generate qr-codes after the merge with a custom script that would use the tagged details to create a vCard. 

 

Thanks for your time, 

David

1 reply

m1b
Community Expert
Community Expert
November 5, 2025

Hi @DavidSmerda  Are you using ExtendScript? If so, here is a quick script that might be helpful for you. Look at the attached demo.indd to see the set up. It's very simple and I'm not expert on XML Tags, but I use them this way quite a bit.

- Mark

 

/**
 * @file Read XML Tags.js
 *
 * Example showing reading XML tagged text from active document.
 *
 * @author m1b
 * @version 2025-11-05
 * @discussion https://community.adobe.com/t5/indesign-discussions/extracting-xml-tags-on-a-given-page/m-p/15577321/thread-id/640522
 */
function main() {

    var doc = app.activeDocument;

    var firstNames = findXmlElements({ doc: doc, tagName: 'FirstName', returnText: true });
    var lastNames = findXmlElements({ doc: doc, tagName: 'LastName', returnText: true });

    for (var i = 0, first, last, page; i < firstNames.length; i++) {

        if (0 === firstNames[i].parentTextFrames.length)
            continue;

        first = firstNames[i].contents;
        last = lastNames[i] ? lastNames[i].contents : '';
        page = firstNames[i].parentTextFrames[0].parentPage;

        alert('Page ' + page.name + ': ' + first + ' ' + last);

    }

}
app.doScript(main, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, 'Do Script');

/**
 * Returns an array of XML Elements.
 * If no `tagName` or `xpath` is provided,
 * will return all the XML Elements except root.
 * @author m1b
 * @version 2025-11-05
 * @param {Object} options
 * @param {Document} options.doc - an Indesign Document.
 * @param {String} [options.tagName] - an XMLTag name (default: all tags).
 * @param {String} [options.xpath] - an XPath to the element (default: all `tagName` elements).
 * @param {Boolean} [options.untag] - whether to untag the XMLElement, if true then `returnText` will be true (default: false).
 * @param {Boolean} [options.returnText] - whether to return Text, if true will return Text (default: false).
 * @param {Boolean} [options.excludeOrphans] - whether to exclude orphans — texts that aren't actually in document (default: true).
 * @returns {Array<XMLElement>|Array<Text>} - array of found XMLElements or Texts.
 */
function findXmlElements(options) {

    options = options || {};

    if (options.doc == undefined)
        throw Error('findXmlElements: no `doc` supplied.');

    var doc = options.doc;
    var xpath = options.xpath || './/' + (options.tagName || '*');
    var result = [];

    var found = doc.xmlElements[0].evaluateXPathExpression(xpath);

    for (var i = 0, text; i < found.length; i++) {

        text = resolve(found[i].texts[0].toSpecifier());

        if (
            !text.isValid
            || !text.parentStory.isValid
            || (
                false !== options.excludeOrphans
                && 'XmlStory' === text.parent.constructor.name
            )
        )
            continue;

        var text = resolve(found[i].texts[0].toSpecifier());
        debugger; // 2025-11-05

        if (true === options.untag) {
            found[i].untag();
            options.returnText = true;
        }

        if (true === options.returnText)
            result.push(text);
        else
            result.push(found[i]);

    }

    return result;

};

 P.S.  The function abstracts the parts which are tricky to remember. The `untag` option is good for when you want to set the tagged text's contents once and only once (it will not be found if you run the script again).

 

Edit 2025-11-05: updated my old findXmlElements function so that it excludes "orphan" elements (XML elements that don't exist in the document).

Inspiring
November 5, 2025

Perfect, 

 

that is exactly what I needed. I will check it thoroughly after I return home. 

I'm coming from Acrobat's and their amazing JavaScript Reference. Sadly I did not find anything similar for InDesign (reference with functioning code snippets), except for an old CS6 Scriptung brochure, which wasn't very useful in the case of extracting text. 


Thank you again for your help.

m1b
Community Expert
Community Expert
November 5, 2025

Your're welcome! I use these docs which is a bare bones presentation of the DOM object model, with no code snippets unfortunately. However I think it is the best we have at present.