• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Is the toUnicode-cmap editable?

New Here ,
Jun 11, 2020 Jun 11, 2020

Copy link to clipboard

Copied

This may sound very specific and advanced, but I really don't know much about PDFs yet.

 

What I want to do: take an existing PDF and export it in Adobe Acrobat Pro DC conforming to PDF/A2u standards. These are standards for the long-term archiving of PDFs. No experience with these standards is necessary to answer my question though, I think.

 

Some PDFs are unable to be exported that way because they have some toUnicode mappings that don't conform to the standards. More specifically: "'ToUnicode'-cmap contains zero as a Unicode value". This is not a huge issue I reckon, but I'd still like my PDFs to conform to the standard at the end.

 

Is there any way to access these mappings? I imagine them as a simple dictionary with key-value pairs of glyphs and Unicode values. As such, it should be easy to change. I can't find anything in the Acrobat though.

 

Can anyone help me with this please?

TOPICS
Edit and convert PDFs , PDF forms , Standards and accessibility

Views

1.7K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 11, 2020 Jun 11, 2020

Copy link to clipboard

Copied

In PDF internals, a ToUnicode map is a text stream embedded in the PDF. Almost always compressed so it will not be simply editable or readable without using software that can decode PDF structures. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 11, 2020 Jun 11, 2020

Copy link to clipboard

Copied

Thank you for the reply! Do you happen to know of any software that can decode those structures? I would assume only Adobe software can do that truly well. Since I'm a bit of a programmer, I'm keen on learning more!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 12, 2020 Jun 12, 2020

Copy link to clipboard

Copied

The format of PDF is not a secret. It's described in this 1000 page book: https://www.adobe.com/content/dam/acom/en/devnet/pdf/PDF32000_2008.pdf

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 12, 2020 Jun 12, 2020

Copy link to clipboard

Copied

While I'm thankful for this and certainly will look into it, this does not directly answer the question if there is a reliable way/software for uncompressing and editing the text streams within a pdf. Maybe reading the document will help, though. I've also ordered a book from one of the devs at Adobe who worked on the format.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 12, 2020 Jun 12, 2020

Copy link to clipboard

Copied

LATEST

You might look into the tool PDF Can Opener. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines