Hexadecimal encoding format in PDF

Report · Oct 30, 2018

I need help with hexadecimal encoding format in PDF files for both text and date and time stamps, e.g.

/Lang <hexadecimals>

/Author <hexadecimals>

/Created <hexadecimals>

etc.

Please point me to relevant page(s) in PDF reference/specification where encoding format is explained in detail, I must admit I didn't succeed on my own.

Report · Oct 30, 2018

These are string laterals. Equivalent: <412042> and (A B)

Be aware that some string constants are Uncode UCS2-LE

Report · Oct 31, 2018

No, it's one byte per character, so not Unicode, but not ANSI/ASCII as in your example either. Here are some examples:

/Creator<15552FFECC9BFCE2A52B2299ACD59E477D1E9E6E46C4C4B67BFBAAD1>

is decoded as "Acrobat PDFMaker 18 for Word"

/Producer <155232F3CBDAD886B34F28BDAFCC9A47240F977B0E9B>

is decoded as "Adobe PDF Library 15.0"

Report · Oct 31, 2018

So that’s just hex encoding as described in ISO 32000-2, 7.3.4.3, Hexidecimal Strings.

Report · Oct 31, 2018

Is this encoding described in PDF specs published by Adobe (ISO standards are not freely available in general)?

Report · Oct 31, 2018

ISO 32000-2 is PDF 2.0 spec, but this encoding can also be found in PDF files with lower spec, e.g. these examples are from a file beginning with %PDF-1.5.

So please point me to relevant section of public PDF specifications, rather than recommending buying an ISO standard. I posted this question because I failed to find this encoding scheme.

Report · Oct 31, 2018

You will find the same thing in ISO 32000-1 (the PDF 1.7 specification) at the same clause # (7.3.4.3).

You will find it in every version of the Adobe PDF specification starting with 1.0 (where you can find it in clause 4.4, Strings).

Be aware that for the last 10 years, PDF is an ISO standard. If you plan to work with PDF – you need the official standard – and that’s ISO 32000 (parts 1 and 2).

Report · Nov 01, 2018

From draft of the standard publicly available, 7.3.4.3:

"A hexadecimal string shall be written as a sequence of hexadecimal digits (0–9 and either A–F or a–f) encoded as ASCII characters"

But this is not what I need, isn't it clear? I need encoding format not for the hexadecimal digits (since I don't have any problem reproducing them here), but for the strings in hexadecimal form -- where is it explained?

If forum support staff can't help me with this, perhaps this should be forwarded to either maintainers of PDF specifications or programmers of PDF Reader who maintain hexadecimal string decoder?

Report · Nov 01, 2018

I‘m sorry you do not like my explanation enough to even mention it. The maintainers of the standard are ISO - you know, the people you refuse to pay for their work...

Report · Nov 01, 2018

Explanations so far are not helpful at all -- such encoding format can be easily found in public files with unencrypted contents, let alone metadata as mentioned above. And since PDF spec of file in question is 1.5, I understand it must be described in pre-ISO PDF specifications made publicly available by Adobe https://www.adobe.com/devnet/pdf/pdf_reference_archive.html

Report · Oct 31, 2018

This has been described clearly in every PDF Reference from Adobe as well as the later ISO standards. These are strings, whether you can interpret them or not.

The example you posted looks as if it is in an encrypted file, where of course all strings - including metadata and hexadecimal strings - are encrypted too.

Report · Nov 01, 2018

You will realize that even if the PDF 1.5 specification has limitations or errors, that there will never be an update. All updates, clarifications and corrections now happen in ISO.

Anyway, please share a link to such a file, which has an Info dictionary that you cannot decode. We have given you full information as we know it, but a particular file may help us understand your particular issue.

Report · Nov 02, 2018

I don't understand why you're asking for a particular file, but if it's of any help, here is an example of the beginning:

%PDF-1.5

%32b word

1 0 obj

<<

/Lang <hexstring>

/MarkInfo <<

/Marked true

>>

/Metadata 2 0 R

/Outlines 3 0 R

/PageLayout /OneColumn

/Pages 4 0 R

/StructTreeRoot 5 0 R

/Type /Catalog

>>

endobj

6 0 obj

<<

/Author <hexstring>

/CTPClassification <hexstring>

/CTP_BU ( w)

/CTP_IDSID ( w)

/CTP_TimeStamp <hexstring>

/CTP_WWID ( w)

/Company <hexstring>

/Created <hexstring>

/CreationDate <hexstring>

/Creator <hexstring>

/Keywords <hexstring>

/LastSaved <hexstring>

/ModDate <hexstring>

/Producer <hexstring>

/SourceModified <hexstring>

/Title <hexstring>

/TitusGUID <hexstring>

>>

endobj

2 0 obj

<<

/Length xxxx

/Subtype /XML

/Type /Metadata

>>

stream

binary data follows

It's not a single file I got this problem with, there are many of them -- guess it may have something to do with software used to produce PDF, such as Acrobat PDFMaker 18 or Adobe PDF Library 15.0 as in a file I picked at random for illustration.

So could you please forward this question either to someone maintaining PDF specification, or better yet, programmers maintaining hex string decoder in Adobe Reader or hex string encoder in Acrobat PDFMaker or Adobe PDF Library? They should be able to answer this immediately.

Report · Nov 02, 2018

Please do as I asked and share the link of an ACTUAL file showing the issue, not a few carefully chosen lines. Your problem is clear, you do not need to repeat it. We need to understand why you have formed this strange view.

I think I have been clear: the PDF specification is now maintained by ISO. To join in their discussions you need to be nominated by your national standards body. I love the idea, though, that you could put questions to Adobe's programmers. In more than 20 years I have had so many questions that would be easy to solve if that were possible.

Report · Nov 02, 2018

Here you are https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e-2100-speci...

Report · Nov 02, 2018

This file is encrypted, so naturally all strings are encrypted.

Report · Nov 02, 2018

Strange. What's the point of encrypting file information displayed by any PDF reader? But I need to decrypt this data anyway -- could you help me by telling where to look for encryption flag and decryption key in a file and refer me to the most relevant section(s) of the reference?

Report · Nov 02, 2018

Encrypt dictionary off the trailer has the info you need to get the decryption started.

Details of the algorithms used, etc, can be found in ISO 32000-1:2008, 7.6 (“Encryption”)

Report · Nov 02, 2018

What the point of encryption is not relevant. This information is in the specification you have been so critical of.

Adobe Community

Hexadecimal encoding format in PDF