Extract text line-by-line from an OCR scan created "editable text and images" pdf file?
I am trying to convert some photocopied bank statements into a more usable form. I am able to successfully use the OCR scanning tool to create a pdf file which contains editable text and images. Now I need to know how to extract the editable text from the resulting file line-by-line like the "Read out Loud" tool does. If I simply try to use the mouse to select the main body of the page (which contains a table of transactions with mm/dd date on the left, a description, and a dollar amount), as I drag the selected area across the page, the selected area expands upward and downward to include editable text at the top and bottom of the page, which I don't want. If I then paste the selected text into a plain text file, I get a completely jumbled result which cannot possibly be parsed into what I want. The issue seems to be that the copy operation proceeds in a kind of vertical columnar manner from left to right, over the entire page. It is obviously possible to process the editable text in a line-by-line left to right manner, because the "Read out Loud" tool does it. So, how do I extract editable text in a line-by-line fashion? Do I have to write code to parse the pdf file? God I hope not. There must be a better way. Help!
