Skip to main content
Participant
April 20, 2024
Answered

Text Selection Problem After OCR (Highlight Problem)

  • April 20, 2024
  • 2 replies
  • 2205 views

Hi everyone!

I created a PDF file with text selectable using OCR. In the PDF I created, it skips selecting the texts as shown in the screenshot. There is no problem when copying, but when selecting, it skips words. There is no problem when copying and pasting. I just want the text I selected to be fully selected so as not to confuse it while working.

 

Thank you in advance for your help.

 

Screenshots

This topic has been closed for replies.
Correct answer gary_sc

Hi, @Habib36866182f990. Yeah, that's a pretty extreme example of something that is often seen with OCRed text.

 

When you process OCR, there are three different routines for the process; here they are:

 

Searchable Image

Ensures that text is searchable and selectable. This option keeps the original image, deskews it as needed, and places an invisible text layer over it. The selection for Downsample Images in this same dialog box determines whether the image is downsampled and to what extent. Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").

Searchable Image (Exact)

Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it. It is recommended for cases requiring maximum fidelity to the original image. Typically, this is what a FedGov agency requires if submitting a scanned image of text.

Editable Text & Images (Formally known as Clear Scan)

Synthesizes a new custom font that closely approximates the original and preserves the page background using a low-resolution copy.

 

If you read over these options, it's pretty clear that you had your settings set for the first one (intentional or not). So, you are successfully capturing all the text (as you claimed), but what you're selecting is not necessarily aligned with the original text. 

 

This is a bit annoying but harmless. It does make it a bit of a challenge when wishing to select a specific word that is not aligned. 

 

If you wish to try and use the other two options, you'll need to go back to your original scan (before you ran the OCR) because Acrobat will not let you re-OCR text that has already been OCRed. If you need help finding these options, let me know which version of Acrobat you are using and if it's the latest version, let me know if you are using the new or old user interface.

2 replies

Adobe Employee
September 23, 2024

Hi @Habib36866182f990 

Can you please share a sample scanned document with me for investigation at our end.

Also, let me know which language did you chose while using OCR the content in Adobe Acrobat Pro.

Thanks,

Shakti K

gary_sc
Community Expert
gary_scCommunity ExpertCorrect answer
Community Expert
April 20, 2024

Hi, @Habib36866182f990. Yeah, that's a pretty extreme example of something that is often seen with OCRed text.

 

When you process OCR, there are three different routines for the process; here they are:

 

Searchable Image

Ensures that text is searchable and selectable. This option keeps the original image, deskews it as needed, and places an invisible text layer over it. The selection for Downsample Images in this same dialog box determines whether the image is downsampled and to what extent. Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").

Searchable Image (Exact)

Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it. It is recommended for cases requiring maximum fidelity to the original image. Typically, this is what a FedGov agency requires if submitting a scanned image of text.

Editable Text & Images (Formally known as Clear Scan)

Synthesizes a new custom font that closely approximates the original and preserves the page background using a low-resolution copy.

 

If you read over these options, it's pretty clear that you had your settings set for the first one (intentional or not). So, you are successfully capturing all the text (as you claimed), but what you're selecting is not necessarily aligned with the original text. 

 

This is a bit annoying but harmless. It does make it a bit of a challenge when wishing to select a specific word that is not aligned. 

 

If you wish to try and use the other two options, you'll need to go back to your original scan (before you ran the OCR) because Acrobat will not let you re-OCR text that has already been OCRed. If you need help finding these options, let me know which version of Acrobat you are using and if it's the latest version, let me know if you are using the new or old user interface.

clemensR
Participant
September 14, 2024

I have a more serious problem. After OCR, I cannot select text at all! That was never a problem. What happened, and how can I use that faculty again? If the problem stays, I have to find a different OCR program.

Adobe Employee
September 23, 2024

Hi @clemensR 

Sorry for the incovenience caused to you:

Can you let me know the following:

  1. Is the issue happening for all of your OCRed file?
  2. Can you please share a sample scanned file with me to investigate the issue.
  3. What is the exact OCR setting in Adobe Acrobat you are using to OCR the file?
  4. Version number of your Adobe Acrobat (Can be find by selecting About Adobe Acrobat Pro in Help menu) and OS version details.

Thanks,

Shakti K