• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Strange Font Encoding in PDF files

Explorer ,
Oct 25, 2021 Oct 25, 2021

Copy link to clipboard

Copied

I have received a number of multipage (150 pg+) PDF docments from a client that will require extensive revision. I have discovered that there is a great amount of type in these documents that is custom encoded, and have names unusual such as MSTT31c750 (Embedded Subset) Type 1 Encoding: Custom. A LOT of them, like 80 instances. All the usual trick to import ot extract text, even using the otherwise excellent Marzware PDFMarz utility produces "Missing Fonts" for these.

This is where it gets STRANGE. Attemps to replace with a common font such as Myriad, Arial, or Helvetica produces gibberish text, as is the "default" font. Even copy pasting the text or saving as a WORD or TXT file produces gibberish - even pasting into a text editor. VERY Strange, and frustrating. The fairly extreme soultuon of exporting a page as a image file, creating a new PDF of the page and running OCR produces copy that would require extemsive manual correction.

The orginating application seems to be Adobe Pagemaker 6.52 / Distiller for WIndows 4.0

The best guess I have is this is some sort of font encoding DRM/Copy Protection scheme, or posibly some sort of variable typeface with non-standard encoding based on the "font names". What's really crazy is that these LOOK like fairly common ordinary fonts...  But I need to be able to either edit or extract this copy for the client's revisions. I do relaize that having an editor manually retype the enitre document may be the eventual - but time consuming and therefore costly - solution.

Anyone seen anything like this?

TOPICS
Edit and convert PDFs , Scan documents and OCR

Views

6.2K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
1 ACCEPTED SOLUTION
Community Expert ,
Oct 25, 2021 Oct 25, 2021

Copy link to clipboard

Copied

Not a DRM issue.

"Back in the day", TrueType font support in Postscript printers was pretty non-existent, so fonts like Arial were downloaded by Windows to Postscript printers in a PS compatible outline. This results in the weird names you are seeing as the names are being created on the fly. Since they were only meant for output, the fonts were also given an abbreviated custom encoding that was also created on the fly to handle only the limited subset characters that would be embedded. Editing a PDF back then was not really a thing (outside of using a program like PitStop), so it didn't matter what the encoding was. Of course, NOW it is an issue, but right now, outside of a few tricks, there's no way to correct/change EXISTING type and edit it the way you want. You should be able to type NEW content in the proper font (e.g. Arial), even in the same line as the old stuff.

View solution in original post

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 25, 2021 Oct 25, 2021

Copy link to clipboard

Copied

Not a DRM issue.

"Back in the day", TrueType font support in Postscript printers was pretty non-existent, so fonts like Arial were downloaded by Windows to Postscript printers in a PS compatible outline. This results in the weird names you are seeing as the names are being created on the fly. Since they were only meant for output, the fonts were also given an abbreviated custom encoding that was also created on the fly to handle only the limited subset characters that would be embedded. Editing a PDF back then was not really a thing (outside of using a program like PitStop), so it didn't matter what the encoding was. Of course, NOW it is an issue, but right now, outside of a few tricks, there's no way to correct/change EXISTING type and edit it the way you want. You should be able to type NEW content in the proper font (e.g. Arial), even in the same line as the old stuff.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 26, 2021 Oct 26, 2021

Copy link to clipboard

Copied

This is not at all unusual. It isn't a copy protection scheme, just the accidental fallout from software, fonts and systems older than PDF itself. It's the best part of 20 years since PageMaker itself was discontinued...

 

There is no way to "repair" the encodings. Indeed, they aren't broken, but just aren't predictable or useful.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Oct 26, 2021 Oct 26, 2021

Copy link to clipboard

Copied

I did raise my eyebrow that there is someone out there still using Pagemaker 6 in 2017... but hey.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 26, 2021 Oct 26, 2021

Copy link to clipboard

Copied

LATEST

I don't think they are. These sound like really old files, especially with the mention of Distiller 4.0. The Document properties should show the original creation date.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 26, 2021 Oct 26, 2021

Copy link to clipboard

Copied

SamuraiArtGuy:

You may want to consider a different workflow. This sounds like a project far beyond the limited editing abilities of a PDF editor.

Obviously, having the original files wouldn't help much as they are PageMaker (although I could convert them to InDesign for you if they are available), but you could look at placing the existing PDF into a new ID document and doing your changes on individual pages on overlays, re-exporting a new PDF. Or, you could insert and replace the changed pages back into the existing PDF document. In the long run, you will have better flexibility and better control for further changes. You may even try a PDF to ID converter to attempt to recover something editable

If you'd care to share a sample document that's particularly troubling, I could suggest some approaches.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Oct 26, 2021 Oct 26, 2021

Copy link to clipboard

Copied

Thank you folks. I had My suspicions, as soon as I saw "Pagemaker" in the metadata.

The workflow is intended to end up in InDesign. I would be bat guano insane to attempt this editing within Acrobat DC.  I discovered the problem using the otherwise excellent Markzware PDFMarkz utility to convert/import the 164-page document. The various options are all varying degrees of tedious, bringing in individual pages of the orignal and pasting over the edits on a new layer. This approach has it's limitatoons, and lacks design flexibility. I can get reasonable OCR from 600-dpi exports (vs the first attempt at 300 dpi)  of individual pages, which I can also use to recover a multitude of individual inline graphics. And bless the Gods of Design, the "Copy witth Formatting" feature in Acrobat DC turns out to recover about 90% of the text from individual pages. So we won't have to have a copy editor retype all the text from the entire document.

So I think the path of least resistance is to re-create these documents with the fiull suite of InDesign's layout tools, and re-set the text I can extract, recover, or OCR. Still tedious, but not brutal. And at the end of the day, the client will have a fresh new original docment that can be freely revised, which is the right way to do it. Also Arial can be banished for opentype versions of Myriad Pro or Heveltica Neue with more typographic flexibility.

Thank you both for your insights and expertise.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines