Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
2

Text Selection Problem After OCR (Highlight Problem)

New Here ,
Apr 20, 2024 Apr 20, 2024

Hi everyone!

I created a PDF file with text selectable using OCR. In the PDF I created, it skips selecting the texts as shown in the screenshot. There is no problem when copying, but when selecting, it skips words. There is no problem when copying and pasting. I just want the text I selected to be fully selected so as not to confuse it while working.

 

Thank you in advance for your help.

 

Screenshots

4f439d47-e825-462d-920f-25480cdbc3f4.jpeg

TOPICS
Edit and convert PDFs , PDF , Scan documents and OCR
2.0K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
1 ACCEPTED SOLUTION
Community Expert ,
Apr 20, 2024 Apr 20, 2024

Hi, @Habib36866182f990. Yeah, that's a pretty extreme example of something that is often seen with OCRed text.

 

When you process OCR, there are three different routines for the process; here they are:

 

Searchable Image

Ensures that text is searchable and selectable. This option keeps the original image, deskews it as needed, and places an invisible text layer over it. The selection for Downsample Images in this same dialog box determines whether the image is downsampled and to what extent. Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").

Searchable Image (Exact)

Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it. It is recommended for cases requiring maximum fidelity to the original image. Typically, this is what a FedGov agency requires if submitting a scanned image of text.

Editable Text & Images (Formally known as Clear Scan)

Synthesizes a new custom font that closely approximates the original and preserves the page background using a low-resolution copy.

 

If you read over these options, it's pretty clear that you had your settings set for the first one (intentional or not). So, you are successfully capturing all the text (as you claimed), but what you're selecting is not necessarily aligned with the original text. 

 

This is a bit annoying but harmless. It does make it a bit of a challenge when wishing to select a specific word that is not aligned. 

 

If you wish to try and use the other two options, you'll need to go back to your original scan (before you ran the OCR) because Acrobat will not let you re-OCR text that has already been OCRed. If you need help finding these options, let me know which version of Acrobat you are using and if it's the latest version, let me know if you are using the new or old user interface.

View solution in original post

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 20, 2024 Apr 20, 2024

Hi, @Habib36866182f990. Yeah, that's a pretty extreme example of something that is often seen with OCRed text.

 

When you process OCR, there are three different routines for the process; here they are:

 

Searchable Image

Ensures that text is searchable and selectable. This option keeps the original image, deskews it as needed, and places an invisible text layer over it. The selection for Downsample Images in this same dialog box determines whether the image is downsampled and to what extent. Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").

Searchable Image (Exact)

Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it. It is recommended for cases requiring maximum fidelity to the original image. Typically, this is what a FedGov agency requires if submitting a scanned image of text.

Editable Text & Images (Formally known as Clear Scan)

Synthesizes a new custom font that closely approximates the original and preserves the page background using a low-resolution copy.

 

If you read over these options, it's pretty clear that you had your settings set for the first one (intentional or not). So, you are successfully capturing all the text (as you claimed), but what you're selecting is not necessarily aligned with the original text. 

 

This is a bit annoying but harmless. It does make it a bit of a challenge when wishing to select a specific word that is not aligned. 

 

If you wish to try and use the other two options, you'll need to go back to your original scan (before you ran the OCR) because Acrobat will not let you re-OCR text that has already been OCRed. If you need help finding these options, let me know which version of Acrobat you are using and if it's the latest version, let me know if you are using the new or old user interface.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 14, 2024 Sep 14, 2024

I have a more serious problem. After OCR, I cannot select text at all! That was never a problem. What happened, and how can I use that faculty again? If the problem stays, I have to find a different OCR program.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Sep 23, 2024 Sep 23, 2024

Hi @clemensR 

Sorry for the incovenience caused to you:

Can you let me know the following:

  1. Is the issue happening for all of your OCRed file?
  2. Can you please share a sample scanned file with me to investigate the issue.
  3. What is the exact OCR setting in Adobe Acrobat you are using to OCR the file?
  4. Version number of your Adobe Acrobat (Can be find by selecting About Adobe Acrobat Pro in Help menu) and OS version details.

Thanks,

Shakti K

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Sep 23, 2024 Sep 23, 2024
LATEST

Hi @Habib36866182f990 

Can you please share a sample scanned document with me for investigation at our end.

Also, let me know which language did you chose while using OCR the content in Adobe Acrobat Pro.

Thanks,

Shakti K

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines