Strange Font Encoding in PDF files
- October 26, 2021
- 5 respuestas
- 9325 visualizaciones
I have received a number of multipage (150 pg+) PDF docments from a client that will require extensive revision. I have discovered that there is a great amount of type in these documents that is custom encoded, and have names unusual such as MSTT31c750 (Embedded Subset) Type 1 Encoding: Custom. A LOT of them, like 80 instances. All the usual trick to import ot extract text, even using the otherwise excellent Marzware PDFMarz utility produces "Missing Fonts" for these.
This is where it gets STRANGE. Attemps to replace with a common font such as Myriad, Arial, or Helvetica produces gibberish text, as is the "default" font. Even copy pasting the text or saving as a WORD or TXT file produces gibberish - even pasting into a text editor. VERY Strange, and frustrating. The fairly extreme soultuon of exporting a page as a image file, creating a new PDF of the page and running OCR produces copy that would require extemsive manual correction.
The orginating application seems to be Adobe Pagemaker 6.52 / Distiller for WIndows 4.0
The best guess I have is this is some sort of font encoding DRM/Copy Protection scheme, or posibly some sort of variable typeface with non-standard encoding based on the "font names". What's really crazy is that these LOOK like fairly common ordinary fonts... But I need to be able to either edit or extract this copy for the client's revisions. I do relaize that having an editor manually retype the enitre document may be the eventual - but time consuming and therefore costly - solution.
Anyone seen anything like this?
