OCR and line numbering
Copy link to clipboard
Copied
Dear Colleagues,
We would need help with this issue in Acrobat Pro. We use OCR many times and need to resolve this.
The recognition of text is very good, but whatever the setting is, the line numbering is always output as a text, what is very time-consuming to delete. Do you have any ideas how to adjust PDF so that the output in Word is such that we can erase those numbers of line very quickly? Thank you very much in advance.
Best regards,
Martin
Copy link to clipboard
Copied
In Acrobat Pro redact the line numbers.
Copy link to clipboard
Copied
To elaborate a bit: Use the Mark for Redaction tool on the first page to draw a rectangle over the area where the line numbers appear. Assuming that's the same for all pages, right-click the comment and select "Repeat mark across pages". Then apply the redactions and export the text.
Copy link to clipboard
Copied
If you do not want the line numbers, why do you not crop them out during your scanning process?
Also, remember that Acrobat cannot scan; it utilizes software called TWAIN to access your scanner's software. So, you have complete control of the process from your scanner's software. So, simply: if you do not want the numbers: don't scan them in the first place!
Copy link to clipboard
Copied
I get my scanned pdfs from the court or other lawyers, I dont have the option of not scanning the line numbers.
It would be great if Adobe AI could recognize the vertical line of numbers half an inch to the left of every line of text, consider the possibility that it might be unwanted line numbering, and have an option of not including those numbers when exporting as word doc.
Copy link to clipboard
Copied
Have you tried to redact the numbers as suggested above?
Also, depending on how and with what the PDFs were generated, it's possible that the numbers might be on their own layer and simply deleting that layer in Acrobat.
One other question: are you receiving the PDFs with text as an image (and you need to do the OCR process), or, are you receiving the PDFs as already searchable? (Just verifying this issue.)
On an aside, You can determine if the following is worth the time, but if you open the PDFs in Photoshop, you can crop them and save them out as a Photoshop PDF. These tend to be much larger in storage size. So, if that's an issue, you can then open these up in Acrobat and resave them as "Reduced sized" and they will then be more "normal" in storage size.

