Looking for pdf info that could point to the right data pre-OCR

Question

I am creating an application that will provide the user with very limited specific information about a large pdf file (300 to 400 pages). OCRing the file is too time consuming and gives me too much irrelevant information. I can zero in on data I need based upon a couple of things: the name of the bookmark, and the general layout of the pages of that bookmarked exhibit. The pages are not pdf forms, they are basically boilerplate forms that present data in a uniform way. I would like to be able to create code that would harvest that text data. Essentially I am wanted to use the positioning of the data on the form to tell me what the data is. So, for example I know that "01/01/1998" is a birthdate based upon where it is positioned on a general page. Is this possible?

Test Screen Name · Answer

You would still need OCR. This is referred to a zoned OCR. Acrobat doesn’t do that.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded