Character encoding issues when a document is autotagged

Report · Dec 03, 2024

I've been having this issue recently that when I autotag a document, it leads to character encoding issues. Except, it doesn't always show up as a failure in that accessibility checker. Sometimes letters just *disappear*. As in, I can see them on the page, but they're no longer in the content containers when I check the tag tree and aren't voiced with a screen reader. Some examples are, "refective, fltered, beneft, specifc, defned". I'm unsure why some seemingly random characters are just missing. This has happened with several PDFs, and don't have access to the source documents, either. When I check character encoding before autotagging, no issues come up. But in most cases, I have to autotag, because the documents don't have space at the end of lines of text in a paragraph, meaning the first and last words will be smooshed together, and autotagging is the only way I've found to fix that. Does anyone know why this is happening or if I can fix it? Furthermore, is there any way to catch it early on, rather than while listening to the document after fully remediating it? Thank you.

Report · Dec 03, 2024

Hi @CM1002 ,

Thanks for posting your issue to Adobe. Would it be possible for you to share the pdf file for which you are getting the issue ?

Regards

Ravi

Report · Feb 26, 2025

Hi @CM1002,

Hope you are doing well. Sorry for the trouble, and the delayed response.

In case you are still looking for a solution, you might want to try the below steps:

Extract the Text Layer & Check for Encoding Issues:

Run "Save As" → Plain Text (.txt) in Acrobat.
Open the text file to see if letters are already missing before autotagging.
If characters are missing, the PDF itself is corrupt at the encoding level.

Force a Proper Unicode Text Layer

1. Open Preflight (Ctrl + Shift + X).
2. Under Fixups, search for “Embed missing fonts” and apply it.
3. If the font embedding doesn’t help, use OCR (even if text is selectable):
  - Go to Scan & OCR → Recognize Text in This File → Set as Editable Text.

Check & Manually Correct in the Tags Panel

Open Tags Panel (View > Show/Hide > Navigation Panels > Tags).
Check if missing characters exist in the actual tag tree.
If they are missing, try manually retyping the word in the tag’s Properties.

If autotagging is corrupting the text, try:

Export the PDF as Word (.docx).
Open in Word → Check text integrity → Reconvert to PDF.
Then manually tag in Acrobat.

Before fully remediating:

Use Read Out Loud (Shift + Ctrl + Y in Acrobat) to test early.
Try exporting as a Tagged PDF and reopening to catch missing characters.

Hope this helps.

-Souvik

Character encoding issues when a document is autotagged

Extract the Text Layer & Check for Encoding Issues:

Force a Proper Unicode Text Layer

Check & Manually Correct in the Tags Panel

li.media.uploader-dialog.title