Skip to main content
Participating Frequently
October 30, 2018
Question

Hexadecimal encoding format in PDF

  • October 30, 2018
  • 6 replies
  • 16148 views

I need help with hexadecimal encoding format in PDF files for both text and date and time stamps, e.g.

/Lang <hexadecimals>

/Author <hexadecimals>

/Created <hexadecimals>

etc.

Please point me to relevant page(s) in PDF reference/specification where encoding format is explained in detail, I must admit I didn't succeed on my own.

This topic has been closed for replies.

6 replies

Braniac
November 2, 2018

What the point of encryption is not relevant. This information is in the specification you have been so critical of.

Braniac
November 2, 2018

This file is encrypted, so naturally all strings are encrypted.

Participating Frequently
November 2, 2018

Strange. What's the point of encrypting file information displayed by any PDF reader? But I need to decrypt this data anyway -- could you help me by telling where to look for encryption flag and decryption key in a file and refer me to the most relevant section(s) of the reference?

lrosenth
Adobe Employee
Adobe Employee
November 2, 2018

Encrypt dictionary off the trailer has the info you need to get the decryption started.

Details of the algorithms used, etc, can be found in ISO 32000-1:2008, 7.6 (“Encryption”)

Braniac
November 2, 2018

Please do as I asked and share the link of an ACTUAL file showing the issue, not a few carefully chosen lines. Your problem is clear, you do not need to repeat it. We need to understand why you have formed this strange view.

I think I have been clear: the PDF specification is now maintained by ISO. To join in their discussions you need to be nominated by your national standards body. I love the idea, though, that you could put questions to Adobe's programmers. In more than 20 years I have had so many questions that would be easy to solve if that were possible.

Participating Frequently
November 2, 2018
Braniac
November 1, 2018

You will realize that even if the PDF 1.5 specification has limitations or errors, that there will never be an update. All updates, clarifications and corrections now happen in ISO.

Anyway, please share a link to such a file, which has an Info dictionary that you cannot decode. We have given you full information as we know it, but a particular file may help us understand your particular issue.

Participating Frequently
November 2, 2018

I don't understand why you're asking for a particular file, but if it's of any help, here is an example of the beginning:

%PDF-1.5

%32b word

1 0 obj

<<

/Lang <hexstring>

/MarkInfo <<

/Marked true

>>

/Metadata 2 0 R

/Outlines 3 0 R

/PageLayout /OneColumn

/Pages 4 0 R

/StructTreeRoot 5 0 R

/Type /Catalog

>>

endobj

6 0 obj

<<

/Author <hexstring>

/CTPClassification <hexstring>

/CTP_BU ( w)

/CTP_IDSID ( w)

/CTP_TimeStamp <hexstring>

/CTP_WWID ( w)

/Company <hexstring>

/Created <hexstring>

/CreationDate <hexstring>

/Creator <hexstring>

/Keywords <hexstring>

/LastSaved <hexstring>

/ModDate <hexstring>

/Producer <hexstring>

/SourceModified <hexstring>

/Title <hexstring>

/TitusGUID <hexstring>

>>

endobj

2 0 obj

<<

/Length xxxx

/Subtype /XML

/Type /Metadata

>>

stream

binary data follows

It's not a single file I got this problem with, there are many of them -- guess it may have something to do with software used to produce PDF, such as Acrobat PDFMaker 18 or Adobe PDF Library 15.0 as in a file I picked at random for illustration.

So could you please forward this question either to someone maintaining PDF specification, or better yet, programmers maintaining hex string decoder in Adobe Reader or hex string encoder in Acrobat PDFMaker or Adobe PDF Library? They should be able to answer this immediately.

Braniac
October 31, 2018

This has been described clearly in every PDF Reference from Adobe as well as the later ISO standards. These are strings, whether you can interpret them or not.

The example you posted looks as if it is in an encrypted file, where of course all strings - including metadata and hexadecimal strings - are encrypted too.

Braniac
October 30, 2018

These are string laterals. Equivalent: <412042> and (A B)

Be aware that some string constants are Uncode UCS2-LE

Participating Frequently
October 31, 2018

No, it's one byte per character, so not Unicode, but not ANSI/ASCII as in your example either. Here are some examples:

/Creator<15552FFECC9BFCE2A52B2299ACD59E477D1E9E6E46C4C4B67BFBAAD1>

is decoded as "Acrobat PDFMaker 18 for Word"

/Producer <155232F3CBDAD886B34F28BDAFCC9A47240F977B0E9B>

is decoded as "Adobe PDF Library 15.0"

lrosenth
Adobe Employee
Adobe Employee
October 31, 2018

ISO 32000-2 is PDF 2.0 spec, but this encoding can also be found in PDF files with lower spec, e.g. these examples are from a file beginning with %PDF-1.5.

So please point me to relevant section of public PDF specifications, rather than recommending buying an ISO standard. I posted this question because I failed to find this encoding scheme.


You will find the same thing in ISO 32000-1 (the PDF 1.7 specification) at the same clause # (7.3.4.3).

You will find it in every version of the Adobe PDF specification starting with 1.0 (where you can find it in clause 4.4, Strings).

Be aware that for the last 10 years, PDF is an ISO standard. If you plan to work with PDF – you need the official standard – and that’s ISO 32000 (parts 1 and 2).