Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Extracting XML tags on a given page

Explorer ,
Nov 04, 2025 Nov 04, 2025

Hello there. 

Does someone know if it is possible to extract a text on a given page that is using specific XML Tag? 
Let's say that I have a business cards with name, phone number and an e-mail address. 
All of those elements may or may not be inside the same text box, but they are all tagged [with the colored brackets] with unique tag and are on a same page. 
The InDesign file may have multiple pages and I would like to target individual pages separately. 

Is it possible to extract the text itself if I know the name of the tag?

 

The purpose of this is that I can create a template for data merge and generate qr-codes after the merge with a custom script that would use the tagged details to create a vCard. 

 

Thanks for your time, 

David

TOPICS
Scripting
106
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 04, 2025 Nov 04, 2025

Hi @DavidSmerda  Are you using ExtendScript? If so, here is a quick script that might be helpful for you. Look at the attached demo.indd to see the set up. It's very simple and I'm not expert on XML Tags, but I use them this way quite a bit.

- Mark

 

Screenshot 2025-11-05 at 16.20.23.png

/**
 * @file Read XML Tags.js
 *
 * Example showing reading XML tagged text from active document.
 *
 * @author m1b
 * @version 2025-11-05
 * @discussion https://community.adobe.com/t5/indesign-discussions/extracting-xml-tags-on-a-given-page/m-p/15577321/thread-id/640522
 */
function main() {

    var doc = app.activeDocument;

    var firstNames = findXmlElements({ doc: doc, tagName: 'FirstName', returnText: true });
    var lastNames = findXmlElements({ doc: doc, tagName: 'LastName', returnText: true });

    for (var i = 0, first, last, page; i < firstNames.length; i++) {

        if (0 === firstNames[i].parentTextFrames.length)
            continue;

        first = firstNames[i].contents;
        last = lastNames[i] ? lastNames[i].contents : '';
        page = firstNames[i].parentTextFrames[0].parentPage;

        alert('Page ' + page.name + ': ' + first + ' ' + last);

    }

}
app.doScript(main, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, 'Do Script');

/**
 * Returns an array of XML Elements.
 * If no `tagName` or `xpath` is provided,
 * will return all the XML Elements except root.
 * @author m1b
 * @version 2025-11-05
 * @param {Object} options
 * @param {Document} options.doc - an Indesign Document.
 * @param {String} [options.tagName] - an XMLTag name (default: all tags).
 * @param {String} [options.xpath] - an XPath to the element (default: all `tagName` elements).
 * @param {Boolean} [options.untag] - whether to untag the XMLElement, if true then `returnText` will be true (default: false).
 * @param {Boolean} [options.returnText] - whether to return Text, if true will return Text (default: false).
 * @param {Boolean} [options.excludeOrphans] - whether to exclude orphans — texts that aren't actually in document (default: true).
 * @returns {Array<XMLElement>|Array<Text>} - array of found XMLElements or Texts.
 */
function findXmlElements(options) {

    options = options || {};

    if (options.doc == undefined)
        throw Error('findXmlElements: no `doc` supplied.');

    var doc = options.doc;
    var xpath = options.xpath || './/' + (options.tagName || '*');
    var result = [];

    var found = doc.xmlElements[0].evaluateXPathExpression(xpath);

    for (var i = 0, text; i < found.length; i++) {

        text = resolve(found[i].texts[0].toSpecifier());

        if (
            !text.isValid
            || !text.parentStory.isValid
            || (
                false !== options.excludeOrphans
                && 'XmlStory' === text.parent.constructor.name
            )
        )
            continue;

        var text = resolve(found[i].texts[0].toSpecifier());
        debugger; // 2025-11-05

        if (true === options.untag) {
            found[i].untag();
            options.returnText = true;
        }

        if (true === options.returnText)
            result.push(text);
        else
            result.push(found[i]);

    }

    return result;

};

 P.S.  The function abstracts the parts which are tricky to remember. The `untag` option is good for when you want to set the tagged text's contents once and only once (it will not be found if you run the script again).

 

Edit 2025-11-05: updated my old findXmlElements function so that it excludes "orphan" elements (XML elements that don't exist in the document).

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 05, 2025 Nov 05, 2025

Perfect, 

 

that is exactly what I needed. I will check it thoroughly after I return home. 

I'm coming from Acrobat's and their amazing JavaScript Reference. Sadly I did not find anything similar for InDesign (reference with functioning code snippets), except for an old CS6 Scriptung brochure, which wasn't very useful in the case of extracting text. 


Thank you again for your help.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 05, 2025 Nov 05, 2025
LATEST

Your're welcome! I use these docs which is a bare bones presentation of the DOM object model, with no code snippets unfortunately. However I think it is the best we have at present.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines