Copy link to clipboard
Copied
Hello there.
Does someone know if it is possible to extract a text on a given page that is using specific XML Tag?
Let's say that I have a business cards with name, phone number and an e-mail address.
All of those elements may or may not be inside the same text box, but they are all tagged [with the colored brackets] with unique tag and are on a same page.
The InDesign file may have multiple pages and I would like to target individual pages separately.
Is it possible to extract the text itself if I know the name of the tag?
The purpose of this is that I can create a template for data merge and generate qr-codes after the merge with a custom script that would use the tagged details to create a vCard.
Thanks for your time,
David
Copy link to clipboard
Copied
Hi @DavidSmerda Are you using ExtendScript? If so, here is a quick script that might be helpful for you. Look at the attached demo.indd to see the set up. It's very simple and I'm not expert on XML Tags, but I use them this way quite a bit.
- Mark
/**
* @file Read XML Tags.js
*
* Example showing reading XML tagged text from active document.
*
* @author m1b
* @version 2025-11-05
* @discussion https://community.adobe.com/t5/indesign-discussions/extracting-xml-tags-on-a-given-page/m-p/15577321/thread-id/640522
*/
function main() {
var doc = app.activeDocument;
var firstNames = findXmlElements({ doc: doc, tagName: 'FirstName', returnText: true });
var lastNames = findXmlElements({ doc: doc, tagName: 'LastName', returnText: true });
for (var i = 0, first, last, page; i < firstNames.length; i++) {
if (0 === firstNames[i].parentTextFrames.length)
continue;
first = firstNames[i].contents;
last = lastNames[i] ? lastNames[i].contents : '';
page = firstNames[i].parentTextFrames[0].parentPage;
alert('Page ' + page.name + ': ' + first + ' ' + last);
}
}
app.doScript(main, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, 'Do Script');
/**
* Returns an array of XML Elements.
* If no `tagName` or `xpath` is provided,
* will return all the XML Elements except root.
* @author m1b
* @version 2025-11-05
* @param {Object} options
* @param {Document} options.doc - an Indesign Document.
* @param {String} [options.tagName] - an XMLTag name (default: all tags).
* @param {String} [options.xpath] - an XPath to the element (default: all `tagName` elements).
* @param {Boolean} [options.untag] - whether to untag the XMLElement, if true then `returnText` will be true (default: false).
* @param {Boolean} [options.returnText] - whether to return Text, if true will return Text (default: false).
* @param {Boolean} [options.excludeOrphans] - whether to exclude orphans — texts that aren't actually in document (default: true).
* @returns {Array<XMLElement>|Array<Text>} - array of found XMLElements or Texts.
*/
function findXmlElements(options) {
options = options || {};
if (options.doc == undefined)
throw Error('findXmlElements: no `doc` supplied.');
var doc = options.doc;
var xpath = options.xpath || './/' + (options.tagName || '*');
var result = [];
var found = doc.xmlElements[0].evaluateXPathExpression(xpath);
for (var i = 0, text; i < found.length; i++) {
text = resolve(found[i].texts[0].toSpecifier());
if (
!text.isValid
|| !text.parentStory.isValid
|| (
false !== options.excludeOrphans
&& 'XmlStory' === text.parent.constructor.name
)
)
continue;
var text = resolve(found[i].texts[0].toSpecifier());
debugger; // 2025-11-05
if (true === options.untag) {
found[i].untag();
options.returnText = true;
}
if (true === options.returnText)
result.push(text);
else
result.push(found[i]);
}
return result;
};
P.S. The function abstracts the parts which are tricky to remember. The `untag` option is good for when you want to set the tagged text's contents once and only once (it will not be found if you run the script again).
Edit 2025-11-05: updated my old findXmlElements function so that it excludes "orphan" elements (XML elements that don't exist in the document).
Copy link to clipboard
Copied
Perfect,
that is exactly what I needed. I will check it thoroughly after I return home.
I'm coming from Acrobat's and their amazing JavaScript Reference. Sadly I did not find anything similar for InDesign (reference with functioning code snippets), except for an old CS6 Scriptung brochure, which wasn't very useful in the case of extracting text.
Thank you again for your help.
Copy link to clipboard
Copied
Your're welcome! I use these docs which is a bare bones presentation of the DOM object model, with no code snippets unfortunately. However I think it is the best we have at present.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now