Cesar cipher

Question

Has anyone delt with mapping issues and potential font transformations making my pdf illegible but only for specific characters, lines / words and relevant phrases.

the result is felt at the parsing level not typically the human read level

I know it’s quite vague but it’s also not buried in the weeds where it seems to want me to live. lol

thanks

Amal Jaiswal · Accepted Answer

​ Hi @10546Hope you are doing well andt hanks for the detailed description, this is actually a common and frustrating issue with PDFs where the text looks fine but doesn't extract or parse correctly. It almost always comes down to font encoding or a broken character mapping. Please try the steps below and see if that works: Check if it's a scan vs. real text. Open the PDF, select some of the affected text with the Selection tool, and copy-paste it into a plain text editor (like Notepad). If it pastes as gibberish or symbols, that confirms a font/encoding mapping issue rather than a visual rendering bug.Run Acrobat's built-in OCR/Text Recognition. Go to Tools > Scan & OCR > Recognize Text. Even if the document already has "real" text, re-running OCR can rebuild a clean ToUnicode mapping and often fixes invisible-but-broken character mapping.Check the font embedding. Go to File > Properties > Fonts tab. Look specifically at the fonts used for the affected words/lines. If you see a font listed as "(Embedded Subset)" using a custom or non-standard encoding, that's frequently the culprit, subsetted fonts sometimes remap glyph IDs in ways that look correct visually but break parsing/extraction.Try "Save As" to a fresh PDF. Use File > Save As Other > PDF, or better, print to a new PDF (Print > Adobe PDF as the printer) and see if the issue persists in the regenerated file. This tells us if it's a structural issue with the file itself vs. something else downstream.If it's isolated to specific characters/phrases, check whether those happen to share a font, were pasted in from another source (like a Word doc or web page), or contain special characters. Mixed-source documents are a common cause of this exact "only some characters" pattern. Could you let us know: Is this PDF originally a scan, or created digitally (e.g., exported from Word/InDesign)?Does the issue happen with one specific file, or all PDFs you create/open?What tool is doing the "parsing" downstream, is it your own script, another app, or something else? That'll help us pinpoint whether this is a font subsetting issue or something specific to your workflow.  ~Amal

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.