Looking for pdf info that could point to the right data pre-OCR

Report · Feb 16, 2018

I am creating an application that will provide the user with very limited specific information about a large pdf file (300 to 400 pages). OCRing the file is too time consuming and gives me too much irrelevant information. I can zero in on data I need based upon a couple of things: the name of the bookmark, and the general layout of the pages of that bookmarked exhibit. The pages are not pdf forms, they are basically boilerplate forms that present data in a uniform way. I would like to be able to create code that would harvest that text data. Essentially I am wanted to use the positioning of the data on the form to tell me what the data is. So, for example I know that "01/01/1998" is a birthdate based upon where it is positioned on a general page. Is this possible?

Report · Feb 16, 2018

Sorry marked as answered : not.

Report · Feb 16, 2018

You would still need OCR. This is referred to a zoned OCR. Acrobat doesn’t do that.

Adobe Community

Looking for pdf info that could point to the right data pre-OCR