Extracting XML tags on a given page

Question

Hello there.

Does someone know if it is possible to extract a text on a given page that is using specific XML Tag?
Let's say that I have a business cards with name, phone number and an e-mail address.
All of those elements may or may not be inside the same text box, but they are all tagged [with the colored brackets] with unique tag and are on a same page.
The InDesign file may have multiple pages and I would like to target individual pages separately.

Is it possible to extract the text itself if I know the name of the tag?

The purpose of this is that I can create a template for data merge and generate qr-codes after the merge with a custom script that would use the tagged details to create a vCard.

Thanks for your time,

David

m1b · Answer

Hi @DavidSmerda Are you using ExtendScript? If so, here is a quick script that might be helpful for you. Look at the attached demo.indd to see the set up. It's very simple and I'm not expert on XML Tags, but I use them this way quite a bit.

- Mark

/**
 * @file Read XML Tags.js
 *
 * Example showing reading XML tagged text from active document.
 *
 * @author m1b
 * @version 2025-11-05
 * @discussion https://community.adobe.com/t5/indesign-discussions/extracting-xml-tags-on-a-given-page/m-p/15577321/thread-id/640522
 */
function main() {

    var doc = app.activeDocument;

    var firstNames = findXmlElements({ doc: doc, tagName: 'FirstName', returnText: true });
    var lastNames = findXmlElements({ doc: doc, tagName: 'LastName', returnText: true });

    for (var i = 0, first, last, page; i < firstNames.length; i++) {

        if (0 === firstNames[i].parentTextFrames.length)
            continue;

        first = firstNames[i].contents;
        last = lastNames[i] ? lastNames[i].contents : '';
        page = firstNames[i].parentTextFrames[0].parentPage;

        alert('Page ' + page.name + ': ' + first + ' ' + last);

    }

}
app.doScript(main, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, 'Do Script');

/**
 * Returns an array of XML Elements.
 * If no `tagName` or `xpath` is provided,
 * will return all the XML Elements except root.
 * @author m1b
 * @version 2025-11-05
 * @param {Object} options
 * @param {Document} options.doc - an Indesign Document.
 * @param {String} [options.tagName] - an XMLTag name (default: all tags).
 * @param {String} [options.xpath] - an XPath to the element (default: all `tagName` elements).
 * @param {Boolean} [options.untag] - whether to untag the XMLElement, if true then `returnText` will be true (default: false).
 * @param {Boolean} [options.returnText] - whether to return Text, if true will return Text (default: false).
 * @param {Boolean} [options.excludeOrphans] - whether to exclude orphans — texts that aren't actually in document (default: true).
 * @returns {Array<XMLElement>|Array<Text>} - array of found XMLElements or Texts.
 */
function findXmlElements(options) {

    options = options || {};

    if (options.doc == undefined)
        throw Error('findXmlElements: no `doc` supplied.');

    var doc = options.doc;
    var xpath = options.xpath || './/' + (options.tagName || '*');
    var result = [];

    var found = doc.xmlElements[0].evaluateXPathExpression(xpath);

    for (var i = 0, text; i < found.length; i++) {

        text = resolve(found[i].texts[0].toSpecifier());

        if (
            !text.isValid
            || !text.parentStory.isValid
            || (
                false !== options.excludeOrphans
                && 'XmlStory' === text.parent.constructor.name
            )
        )
            continue;

        var text = resolve(found[i].texts[0].toSpecifier());
        debugger; // 2025-11-05

        if (true === options.untag) {
            found[i].untag();
            options.returnText = true;
        }

        if (true === options.returnText)
            result.push(text);
        else
            result.push(found[i]);

    }

    return result;

};

P.S. The function abstracts the parts which are tricky to remember. The `untag` option is good for when you want to set the tagged text's contents once and only once (it will not be found if you run the script again).

Edit 2025-11-05: updated my old findXmlElements function so that it excludes "orphan" elements (XML elements that don't exist in the document).

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded