Skip to main content
Inspiring
November 15, 2018
Question

Tag element for text extraction from PDF

  • November 15, 2018
  • 1 reply
  • 746 views

We use templates a lot. Every project gets a code (5 numbers) to identify it. This code is put into the same text frame for every project.

For automation beyond InDesign, it would be wonderful if I could some how tag this text frame. Then, when the PDF is created, be able to extract what ever text was inside of that text frame. I'd like to be able to do this without the clunky GUI-driven Acrobat Pro, but maybe with a CLI like exiftool or muPDF.

I was looking into the Tags panel in InDesign, but I have not found any way to read the contents of a tagged element without Acrobat Pro's Tags panel.

Does anyone have any ideas for how this could be done?

This topic has been closed for replies.

1 reply

Community Expert
November 16, 2018

How about adding this code to the file XMP, which can then be extracted out using XMP extraction tools like exiftool. So basically we would be adding the code to file XMP and not to the textframe specifically, you could keep using the textframe as its being used today and add one new step to add the project code to the XMP as well. I presume that the code is unique per document so this could work.

Also adding info can easily be done using the FileInfo option in InDesign and can also be done easily via scripting(if you feel the need to use a new property and not use any existing property).

-Manan

-Manan
toddm72Author
Inspiring
November 16, 2018

Thanks for the reply Manan!

Problem is, these templates are used internally, freelancers and potentially anybody. I was mucking around with event handlers and start-up scripts - made a handler that extracts the contents from the TextFrame on close, and adds it to the XMP. But this would be a big ask to install something like that on a bunch of freelance designers computers. If any old client supplied designer uses the template, it would be unreasonable to expect them update the XMP manually as well.

It's got me thinking though.

Community Expert
November 16, 2018

In that case how about you tag the project box with a tag say Project ID and with the PDF or INDD file also export out a corresponding XML using File>Export>XML

Though this would be an additional file apart from the ones you have been using. Just bouncing ideas.

-Manan

-Manan