Copy link to clipboard
Copied
I am creating a diagnostic tool for PDFs, and one of my flags is whether the document has metadata, specifically keywords. My output from the diagnostic tool is a tab delimited file. I use the following Javascript to report the keywords of the PDF.
dataLine += this.info.keywords;
The issue is that when I manually delete text from the keywords field, save the file, close it, reopen it, and then run this script, the output is the exact text that I just deleted. Why is the output not blank? The document properties show nothing in the keywords field.
For instance, if I type into the keywords field "test" and nothing else, save the file, close it, reopen it, and then run this script, the output is "test", as expected. Now when I manually delete "test" from the keywords field, save the file, close it, reopen it, and then run this script, the output is still "test". I would expect the output to be blank, not "test".
What has not worked is deleting the cached file that stores all the text that I typed into the metadata fields.(C:\Users\cboccio\AppData\Roaming\Adobe\XMP\FileInfoLibPrefs.txt)
I am using Adobe Acrobat XI and I can post the entire script if that would be helpful.
Any ideas are appreciated.
Thanks,
Chris
The issue is that the keywords are stored in two location within the Metadata. The info part of the metadata, also called the core properties, was an original part of the PDF spec, before XML metadata. When XML metadata came along it became the main data, and the info became a legacy thing, sort of stuffed in sideways. The keywords for some reason get put in two locations. You have to delete the keywords from the "Dublin Core Properties" in the XML metadata to get rid of them. Its odd, you can a
...Copy link to clipboard
Copied
The issue is that the keywords are stored in two location within the Metadata. The info part of the metadata, also called the core properties, was an original part of the PDF spec, before XML metadata. When XML metadata came along it became the main data, and the info became a legacy thing, sort of stuffed in sideways. The keywords for some reason get put in two locations. You have to delete the keywords from the "Dublin Core Properties" in the XML metadata to get rid of them. Its odd, you can add them in the info object and they stick, but you can only delete them from the XML metadata.
You'll find examples of parsing and modifying the XML metadata here: