Improving Acrobat OCR accuracy for lines in the document (e.g. blank lines for signatures or table grid lines)
I am using Acrobat Pro. I often use the OCR function on documents that I get from clients as photos. Often, these documents include lines, such as underlined text or lined spaces for a signature or grid lines as part of tables. Whether Acrobat does its OCR with the “Scan & OCR” tool or the “Edit PDF” tool, it consistently treats these lines as part of background imagery. It does an okay job recognizing underlining of text (but not better than okay) and pretty much never recognizes blank lines as pseudo text (in the case of spaces for signatures) or formatting (in the case of boxing in and around tables).
Often, the images I get from clients including yellowing or low-res smearing in the white background of the document. OCRing turns them into editable images. I would like to be able to select and delete these image background, but because Acrobat’s OCRing includes the blank lines and table lines described above as included in the background image, deleting the background fuzz also causes these “real” and necessary lines to disappear. I’m left keeping the background fuzz in place, resulting in larger file sizes and dirtier documents.
Is there a way to improve this with settings at my end? Is this something that needs fixing at Adobe’s end? Thanks.