Skip to main content
Participating Frequently
June 11, 2020
Question

Is the toUnicode-cmap editable?

  • June 11, 2020
  • 1 reply
  • 2975 views

This may sound very specific and advanced, but I really don't know much about PDFs yet.

 

What I want to do: take an existing PDF and export it in Adobe Acrobat Pro DC conforming to PDF/A2u standards. These are standards for the long-term archiving of PDFs. No experience with these standards is necessary to answer my question though, I think.

 

Some PDFs are unable to be exported that way because they have some toUnicode mappings that don't conform to the standards. More specifically: "'ToUnicode'-cmap contains zero as a Unicode value". This is not a huge issue I reckon, but I'd still like my PDFs to conform to the standard at the end.

 

Is there any way to access these mappings? I imagine them as a simple dictionary with key-value pairs of glyphs and Unicode values. As such, it should be easy to change. I can't find anything in the Acrobat though.

 

Can anyone help me with this please?

This topic has been closed for replies.

1 reply

Legend
June 11, 2020

In PDF internals, a ToUnicode map is a text stream embedded in the PDF. Almost always compressed so it will not be simply editable or readable without using software that can decode PDF structures. 

SmogshaikAuthor
Participating Frequently
June 11, 2020

Thank you for the reply! Do you happen to know of any software that can decode those structures? I would assume only Adobe software can do that truly well. Since I'm a bit of a programmer, I'm keen on learning more!

Legend
June 12, 2020

The format of PDF is not a secret. It's described in this 1000 page book: https://www.adobe.com/content/dam/acom/en/devnet/pdf/PDF32000_2008.pdf