Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Character encoding issues when a document is autotagged

Community Beginner ,
Dec 03, 2024 Dec 03, 2024

I've been having this issue recently that when I autotag a document, it leads to character encoding issues. Except, it doesn't always show up as a failure in that accessibility checker. Sometimes letters just *disappear*. As in, I can see them on the page, but they're no longer in the content containers when I check the tag tree and aren't voiced with a screen reader. Some examples are, "refective, fltered, beneft, specifc, defned". I'm unsure why some seemingly random characters are just missing. This has happened with several PDFs, and don't have access to the source documents, either. When I check character encoding before autotagging, no issues come up. But in most cases, I have to autotag, because the documents don't have space at the end of lines of text in a paragraph, meaning the first and last words will be smooshed together, and autotagging is the only way I've found to fix that. Does anyone know why this is happening or if I can fix it? Furthermore, is there any way to catch it early on, rather than while listening to the document after fully remediating it? Thank you.

TOPICS
PDF , Standards and accessibility
247
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Dec 03, 2024 Dec 03, 2024

Hi @CM1002 ,

 

Thanks for posting your issue to Adobe. Would it be possible for you to share the pdf file for which you are getting the issue ?

 

Regards

Ravi

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Feb 26, 2025 Feb 26, 2025
LATEST

Hi @CM1002,

 

Hope you are doing well. Sorry for the trouble, and the delayed response.

 

In case you are still looking for a solution, you might want to try the below steps:

Extract the Text Layer & Check for Encoding Issues: 

  • Run "Save As" → Plain Text (.txt) in Acrobat.
  • Open the text file to see if letters are already missing before autotagging.
  • If characters are missing, the PDF itself is corrupt at the encoding level.

Force a Proper Unicode Text Layer

    1. Open Preflight (Ctrl + Shift + X).
    2. Under Fixups, search for “Embed missing fonts” and apply it.
    3. If the font embedding doesn’t help, use OCR (even if text is selectable):
      • Go to Scan & OCRRecognize Text in This FileSet as Editable Text.

Check & Manually Correct in the Tags Panel

  • Open Tags Panel (View > Show/Hide > Navigation Panels > Tags).
  • Check if missing characters exist in the actual tag tree.
  • If they are missing, try manually retyping the word in the tag’s Properties.

If autotagging is corrupting the text, try:

  • Export the PDF as Word (.docx).
  • Open in Word → Check text integrity → Reconvert to PDF.
  • Then manually tag in Acrobat.

Before fully remediating:

  • Use Read Out Loud (Shift + Ctrl + Y in Acrobat) to test early.
  • Try exporting as a Tagged PDF and reopening to catch missing characters.

Hope this helps.


-Souvik

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines