Tag element for text extraction from PDF

Question

We use templates a lot. Every project gets a code (5 numbers) to identify it. This code is put into the same text frame for every project.

For automation beyond InDesign, it would be wonderful if I could some how tag this text frame. Then, when the PDF is created, be able to extract what ever text was inside of that text frame. I'd like to be able to do this without the clunky GUI-driven Acrobat Pro, but maybe with a CLI like exiftool or muPDF.

I was looking into the Tags panel in InDesign, but I have not found any way to read the contents of a tagged element without Acrobat Pro's Tags panel.

Does anyone have any ideas for how this could be done?

Manan Joshi · Answer

How about adding this code to the file XMP, which can then be extracted out using XMP extraction tools like exiftool. So basically we would be adding the code to file XMP and not to the textframe specifically, you could keep using the textframe as its being used today and add one new step to add the project code to the XMP as well. I presume that the code is unique per document so this could work.

Also adding info can easily be done using the FileInfo option in InDesign and can also be done easily via scripting(if you feel the need to use a new property and not use any existing property).

-Manan

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded