Participating Frequently

Question

Character Encoding Error Using Acrobat Pro 2020 OCR

Forum|Forum|3 years ago
April 15, 2023
4 replies
7390 views

Acrobat Pro 2020.005.30467 (had an update this week)

Windows 10 Pro 19045.2728

Experience: Just a user; no programming.

I finalize reports that need to pass the basic 508 requirement (not for the web).

With these reports, I usually have to insert a scanned signature page (scanned by various people using various scanning equipment), and then OCR it so that I can apply tags. The signature page is prepared with a wet (not digital) signature. Most of the time after applying OCR, the signatures get tagged as figures with a description that they are approvals/signatures. No problem. No issues.

Some reports are required to have a Quality Control (QC) statement. These are usually sent as a digital Adobe Signed document that are made accessible after applying OCR.

Included in the reports are appendices of various documents (shipping docs, invoices, data sheets, other reports, etc.) that have to be scanned, OCRd, and made accessible. Sometimes these documents are sent electronically as pictures but then have to be OCRd to make accessible. All important information is tagged as text and not figures.

For some reason, Acrobat is frequently creating character encoding errors after the OCR or applying tags to the digitally signed QC statements. The majority of character encoding errors are with the signatures of the scanned pages.

In the past this happened occasionally with math equations or scientific formulas created in MS Word (Word to PDF), but now it's happening a lot! with signatures and other items.

I've tried all sorts of things: saving as a picture then back to a PDF to tag it; save as a PDF; save to a PDF using the PDF printer driver; popping it into Adobe Illustrator and back to a PDF. I tried using the preflight tool. I looked at fonts installed on my system. I can't even remember all of the things I have tried.

(I do not have Adobe PhotoShop.)

In many cases, the source document is the document being used and has to be OCRd.

The important question is Why all of the character encoding errors now when using Acrobat's OCR feature?

Thank you for any help or direction.

Standards and accessibility

E

Enthusiastic_Rocket1587Author

Participating Frequently

Have there been any updates on this issue, or issue with encoding errors?

I'm trying to 508 a data sheet with wet signatures that was scanned. The company logo symbol, the chemical structure, and other items are generating encoding errors. I've tried editing the Acrobat file to make sure it is using fonts Arial or TNR. So frustrating. Acrobat Versions below.

AnandSri

Legend

Hello,

I hope you're doing well, and we apologize for the delayed response and the trouble.

Please ensure you have the latest version of Acrobat installed on the machine: 24.005.20421 Optional update, Feb 24, 2025. Check for any pending updates from the Menu > help > check for updates, install the updates. Reboot the machine and check.

If the issue persists, could you please share the file with us and the logs from the machine for further investigation?

Thanks,

Anand Sri.

Bevi Chagnon - PubCom.com

Legend

Some background info...

The accessibility checkers are constantly being updated and, consequently, are finding more errors with our PDFs.

Which checker did you use? The accessibility checker built in Acrobat? Or a third-party checker like PAC or CommonLook? If so, which version of the checker?

Character encoding errors are usually caused by one of these situations:

The font used in the PDF is not available on your computer, so some glyphs might be missing (shown by blank spaces, dots, white boxes, white boxes with an x, or an incorrect character).
A symbol or punctuation was used from a non-Unicode font, such as an old TrueType or PostScript type 1 font, and that symbol isn't recognized by assistive technologies. A common instance of this is when the bullet character is used from an old TrueType Symbol font. Since the font isn't Unicode, the bullet glyph uses Symbol #183 — which is not a Unicode bullet (U+2022) and causes a character encoding error.
(Note that the Symbol font installed with Windows and MS Office is being updated by Microsoft; for decades the default Symbol font has been a TrueType non-Unicode version, but recent updates to Windows are replacing it with an OpenType/Unicode version. Yea! Hopefully this will reduce some of the encoding errors when people make bulleted lists in MS Word — they'll actually get the Unicode 2022 bullet glyph.)

I'm concerned that this is happening to OCR'd content. The original scan doesn't have any fonts at all because it's graphical text (printed or scanned in). The OCR of the scan uses Adobe's default built-in fonts to create the invisible OCR text (the live OCR'd text is hidden and we see only the original printed/scanned text, but assistive technologies access the live hidden text.

To the best of my knowledge, the only way you could have a character error with OCR'd text is when that special built-in Acrobat font is missing on your computer. Maybe it was deactivated or uninstalled?

Signatures usually are OCR'd as figures, not text, so there shouldn't be a text error at all unless the OCR utility found some text-like elements in it, like maybe a printed name underneath the wet signature.

It would help if you can post some screen captures of the text that is being flagged and the error message.

|    Bevi Chagnon   |  Designer, Trainer, & Technologist for Accessible Documents ||    PubCom |    Classes & Books for Accessible InDesign, PDFs & MS Office |

E

Enthusiastic_Rocket1587Author

Participating Frequently

Character Encoding - Failed

Using the OCR built in Acrobat. Using the accessibility checker built in Acrobat.

No recent updates. Started happening in 2020-2022.

Yes. -- Signatures are OCR'd as figures, not text, Yes, there is a printed name underneath the wet signature, but that hasn't changed. I select the whole wet signature and printed signature and make it into a figure with alt text, but the character encoding failed error will often present itself.

I have no idea if any fonts have been deleted. I haven't deleted any. The error is intermittent, so that is odd too. I will get it on one PDF but not another. It doesn't matter how many times I redo (even with a fresh file) the PDF with the error, I still get the error.

I'm wondering if there is some kind of glitch happening because of some Windows conflict. IDK.

I'm unable to share the current document, because it is confidential.

Thank you for kind assistance.

Bevi, thank you for posting the Acrobat Accessibility Series | Adobe Document Cloud.

E

Enthusiastic_Rocket1587Author

Participating Frequently

Below is an instance where the graphs were originated from Excel and pasted as pictures into Word. From the Accessibility Checker in Acrobat Pro 2020, you can see where the encoding error is picking up text from the bottom picture. I confirmed it is a picture, just like the graph at the top. Why? This page was converted from Word to PDF; no scanning or OCR.

Thank you.

Meenakshi_Negi

Legend

Hi kga-rti-kga,

Thank you for reaching out and reporting this.

Please share the following information with us to investigate this issue:

- Did you start experiencing this after the recent update?

- Share the screenshot of the error message.

- Share the screen recording of the workflow for a better understanding.

It will help us to replicate the behavior at our end.

Thanks,

Meenakshi

T

Test Screen Name

Legend

What do you mean by "character ending issue"? Do you mean that the OCR produces the wrong text? (Not a message). Acrobat can't OCR handwriting so it won't be able to accurately OCR handwritten signatures especially, which are often illegible even to humans.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded