Copy link to clipboard
Copied
PDFs created by MacOS seem to flag syntax errors when checked by Acrobat's Preflight. (GhostScript also grumbles when processing them.)
Acrobat says that "The value associated with the key is of an incorrect type. Key: AAPL:Keywords; Type: CosArray." Looking at the raw data, it appears that Apple creates its own duplicate "Apple" set of any keywords used. The 'standard' PDF Keywords are stored as one string, with each word separated by a comma. Apple's keywords are stored as an array of individual strings.
28 0 obj
(First, second, third)
endobj
29 0 obj
[ (First) (second) (third) ]
endobj
1 0 obj << /Title 22 0 R /Author 24 0 R /Subject 25 0 R /Producer 23 0 R /Creator
26 0 R /CreationDate 27 0 R /ModDate 27 0 R /Keywords 28 0 R /AAPL:Keywords 29 0 R >>
Sure enough, editing the Apple Keywords to be one string without the array, and the PDF passes the Preflight syntax check.
In the 1.7 PDF Spec, it explictly says "The value associated with any key ... must be a text string" (p. 843). However, I can't find this limitation in the 1.3 Spec, and that's the version Apple uses. The 1.7 Spec also says that "Any entry whose value is not known should be omitted from the dictionary rather than included with an empty string as its value." Apple uses an empty array [ ( ) ] with no keys, which again seems to break the spec.
I've found this syntax used on several different versions, as far back as Mountain Lion.
It also seems a bit mad that the Modification Date and Creation date point to the same object. Modifying the PDF seems to only change the file system metadata for modification dates.
So, my questions are:
Is Apple breaking the spec, or is the syntax checker wrong?
Is the difference in version numbers significant in this regard? (I'd always assumed each PDF version is a superset of the previous.)
Does anyone know why on earth Apple needs to duplicate keywords in its own special Apple field?
Lots of lovely technical detail would be welcome. Thanks.
Let's assume that answers along the lines of "Don't use Apple's PDFs" have already been made. 😉
Copy link to clipboard
Copied
Thanks for writing this up - I came across this post while trying to understand an error I was getting from PyPDF4. It crashes when trying to copy the /AAPL:Keywords field from one document to another because it assumes all the document info values will be strings. It's good to know that isn't just a mistake in that library. Since the field seems to violate the spec, for my use case I'm just going to strip it out during copying.
Apple does describe the field a little in the Core Graphics docs - https://developer.apple.com/documentation/coregraphics/kcgpdfcontextkeywords - though I don't see an explanation of why it's necessary.
Get ready! An upgraded Adobe Community experience is coming in January.
Learn more