Skip to main content
Participating Frequently
July 8, 2024
Question

How can an Acrobat plugin store document-wide information in a PDF file?

  • July 8, 2024
  • 2 replies
  • 548 views

I'm looking for a way for our Acrobat plugin to store some document-wide information in a PDF file.

 

The info would be a simple list of strings.

 

The list of strings would be displayed and edited in a dialog box displayed by our plugin.

 

The user would later select one of the strings when using our plugin to add one of our Acrobat plugin's custom annotations to the PDF file.

 

Multiple annotations can be added to the PDF.

 

The idea is for the list of strings to be stored in one place in the PDF and for each of our plugin's annotations to store an integer index into the list of strings in the dictionary associated with the annotation.

 

So far I have these ideas:

 

 

APPROACH #1: Store the list of strings in the XMP metadata in the PDF file 

 

This approach would use the functions

 

PDDocGetXAPMetadata() and PDDocSetXAPMetadata()

 

or

 

PDDocCountXAPMetadataArrayItems(), PDDocGetXAPMetadataArrayItem(), PDDocSetXAPMetadataArrayItem()

 

I read about PDF XMP XML metadata in "Developing with PDF - Dive into the Portable Document Format" by Leonard Rosenthol - a great book.

 

But I don't like this idea - the user can see the XMP metadata in Acrobat just by viewing document properties and opening the Additional Metadata dialog.

 

So far, I've only tried using PDDocGetXAPMetadata() and PDDocSetXAPMetadata() - these functions get and set ALL of the XMP metdata XML for the PDF file.

 

PDDocGetXAPMetadata() returns the data in an altered XML layout - if a PDF has multiple <rdf:Description> XML tags in its XMP metadata, the child tags of the separate <rdf:Description> XML tags are converted into attributes of a single <rdf:Description> XML tag, which means future calls to PDDocSetXAPMetadata() will change the original format of the PDF's XMP metadata XML.

 

And there is another issue - PDDocSetXAPMetadata() saves the XMP XML data, but leaves the previous copy of the XMP XML data in the PDF file, so that the file grows larger each time you call it. Future calls to PDDocGetXAPMetadata() correctly return the last XMP XML that was set, but why are previous versions of the XMP XML being left in the PDF file?

 

Maybe I'll have better luck with PDDocCountXAPMetadataArrayItems(), PDDocGetXAPMetadataArrayItem(), PDDocSetXAPMetadataArrayItem().

 


APPROACH #2: Create an indirect array and store an entry for it in the PDF file's Catalog Dictionary

 

This approach would use code like:

 

AVDoc avDoc = AVAppGetActiveDoc();

 

PDDoc pdDoc = AVDocGetPDDoc(avDoc);

 

CosDoc cosDoc = PDDocGetCosDoc (pdDoc);

 

CosObj catalogDict = CosDocGetRoot(cosDoc);

 

CosObj myStringListArray = CosDictGetKeyString(catalogDict, "MyAcrobatPluginName_MyListOfStrings");

 

if ( CosObjGetType( myStringListArray ) == CosNull ) {
    // add mark info to the document
    myStringListArray = CosNewArray( cosDoc, true, 1 );
    CosDictPutKeyString( catalogDict, "MyAcrobatPluginName_MyListOfStrings", myStringListArray );
}

 

// Use CosArrayInsert(), CosArrayLength(), CosArrayGet(), CosArrayRemove() to get/set the contents of myStringListArray.

 

But is it legal for an Acrobat plugin to add any entry it wants to a PDF file's Catalog Dictionary? Or must all entries in the Catalog Dictionary be limited to those listed in the PDF standard?

 

A separate issue - the potential for name collisions with data stored by other Acrobat plugins exists - must the Catalog Dictionary entry name be prefixed with some kind of registered name for the plugin?

 


APPROACH #3:

 

Use a "PDF version extensions" as described at https://opensource.adobe.com/dc-acrobat-sdk-docs/library/plugin/Plugins_Documents.html#pdf-version-extensions

 

This approach involves adding an "Extensions" dictionary to the PDF file's Catalog Dictionary.

 

Then we would add an child dictionary for our Acrobat plugin to the "Extensions" dictionary, like this:

 

<</Type /Catalog
/Extensions
<</ADBE
<< /BaseVersion /1.7 /ExtensionLevel 3 >>
>>
<</MYCOMPANY
<< /BaseVersion /1.0 /ExtensionLevel 1 >>
>>
>>

 

The question is - can the child dictonary for MYCOMPANY store any information the plugin wants, like another dictionary entry with an indirect reference to a list of strings?

 

This topic has been closed for replies.

2 replies

Thom Parker
Community Expert
Community Expert
July 8, 2024

There is nothing wrong with adding a custom entry to the Doc Catalog.   And its always a good idea to follow the dictionary extensions convention, however, unless you pick a common name, it would be unlikely to have a name crash.  

 

You could also use the "info" dictionary. And this would expose the entries to UI, where it could be edited manually. 

Edit: I see that you don't like the Metadata (info object) idea.  The metadata is a big xml string that's stored in a stream object, so unless a "SaveAs" is done, a whole new copy is tacked onto the end of the PDF each time the PDF is saved. 

 

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often
Participating Frequently
July 8, 2024

Good to hear that adding a custom entry to the Doc Catalog is okay. 

 

Since I first posted, I've read some of the PDF standard that backs up what you say - if an app doesn't recognize some piece of custom information in a PDF file, it should just ignore it.

Thom Parker
Community Expert
Community Expert
July 8, 2024

Yes, PDF viewers will ignore anything in the PDF that they are not interested in.  The bigger issue is saving the PDF, especially when changing the PDF format for optimization or some other purpose.  In most situations Acrobat ignores custom entries in the Catalog, but entries in other specialized dictionaries may not be as well behaved. I'd suggest creating a custom entries just to see what happens. 

 

For your purposes it makes no difference at all if the values under your dictionary are direct or indirect if they are only being used in one place.  If the array values will be reference in the annotation added later, or changed frequently, then it can help to make them indirect. It's also good to make the dictionary added to the doc catalog indirect. However, the custom object is adding so little in file size that it's not going to make any difference. 

 

You can prototype this structure, modify it, and generally see how your plug-in operates on the PDF file with this developer plug-in

https://www.windjack.com/product/pdfcanopener/

 

 

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often
Bernd Alheit
Community Expert
Community Expert
July 8, 2024

You can use PDF dictionary extensions:
https://opensource.adobe.com/dc-acrobat-sdk-docs/library/plugin/index.html#pdf-dictionary-extensions

 
Or use a hidden form text field.

 

Participating Frequently
July 8, 2024

After doing some more reading, and reading both your and Thom Parker's replies, I think I should do the following:

 

Get a 4-letter prefix for our company registered by following the process at https://github.com/adobe/pdf-names-list

 

I read about that process in the document "Adobe Supplement to ISO 32000-1 BaseVersion 1.7 Extension Level 3.pdf",.

 

That document is mentioned on the web page https://opensource.adobe.com/dc-acrobat-sdk-docs/library/plugin/Plugins_Documents.html#pdf-version-extensions and there is a link to the document at https://pdfa.org/resource/pdf-specification-archive/

 

Then do one of the following two approaches:

 

APPROACH #1

 

Create a custom catalog dictionary entry like

 

%PDF 1.7
<</Type /Catalog
    /ABCD 1 0 R % Reference to a dictionary with company ABCD's document-wide plugin data.
>>

 

where ABCD is our company's unique prefix and the value for that dictionary entry is an indirect reference to another dictionary with all our plugin's document-wide data like:

 

1 0 obj % object ID 1, generation 0
<<
/PluginVersion 1 % The version of our plugin's document-wide data format - version 1.
/ListOfStrings 2 0 R % Reference to the array of strings.
>>
endobj

 

2 0 obj % object ID 2, generation 0
[ string1 string2 string3 ] % The array of strings.
endobj


APPROACH #2

 

Follow something more like https://opensource.adobe.com/dc-acrobat-sdk-docs/library/plugin/Plugins_Documents.html#pdf-version-extensions

 

Create a catalog dictionary Extensions entry like:

 

%PDF 1.7
<</Type /Catalog
/Extensions
<</ABCD
    << /BaseVersion /1.7
        /ExtensionLevel 1 % Interpret this as the version of our plugin's document-wide data format - version 1.
        /ABCD 1 0 R % Reference to a dictionary with company ABCD's document-wide plugin data.
    >>
>>

where ABCD is our company's unique prefix and its Extensions dictionary entry value is an indirect reference to another dictionary with all our plugin's document-wide data like:

 

1 0 obj % object ID 1, generation 0
<<
/PluginVersion 1 % The version of our plugin's document-wide data format - version 1.
/ListOfStrings 2 0 R % Reference to the array of strings.
>>
endobj

 

2 0 obj % object ID 2, generation 0
[ string1 string2 string3 ] % The array of strings.
endobj

 

I would greatly appreciate hearing people's reactions to these suggestions.