Skip to main content
Known Participant
February 16, 2018
Question

Looking for pdf info that could point to the right data pre-OCR

  • February 16, 2018
  • 2 replies
  • 370 views

I am creating an application that will provide the user with very limited specific information about a large pdf file (300 to 400 pages). OCRing the file is too time consuming and gives me too much irrelevant information. I can zero in on data I need based upon a couple of things: the name of the bookmark, and the general layout of the pages of that bookmarked exhibit. The pages are not pdf forms, they are basically boilerplate forms that present data in a uniform way. I would like to be able to create code that would harvest that text data. Essentially I am wanted to use the positioning of the data on the form to tell me what the data is. So, for example I know that "01/01/1998" is a birthdate based upon where it is positioned on a general page. Is this possible?

This topic has been closed for replies.

2 replies

Legend
February 16, 2018

You would still need OCR. This is referred to a zoned OCR. Acrobat doesn’t do that.

Known Participant
February 16, 2018

Sorry marked as answered : not.