Highlighted

OCR from the same areas on several pages

New Here ,
Oct 03, 2020

Copy link to clipboard

Copied

I am hoping for a solution using the Action wizard only for the following problem. I am pretty new to Acrobat so I am sure there are many options I have not yet considered for my task.

 

I am working with a large series of typewritten forms that were scanned. These forms contain information that should be read semi-automatically. The same information is in the same area on every page and every page has 6 "areas of interest" that contain said information. The rest of the page is different from page to page so OCRing entire pages would create different levels of noise depending on the page. That is why I want to OCR only those areas of interest and get the output as plaintext. (The goal of the data is in Excel so I will try to get the output in there as directly as possible by VBA, although reading from exported files in VBA is possible, too.)


I was able to create an action that lets the user crop every page down to one of the aforementioned areas and then run OCR and output the text automatically. This process will force the user to wait while only one of the areas is processed, and then repeat it once for every area of the form.

 

Ideally, I would want the user to select all fields on one page, this pattern to be applied to every page, and then OCR data to be exported for those fields separately. 

Alternatively, getting coordinate data from a user's selection would also work as I could use them in VBA to automate the cropping process.

For these two strategies, I haven't found appropriate commands in Acrobat yet.

 

Does anyone have an idea about what I can do?

TOPICS
Acrobat SDK and JavaScript, Edit and convert PDFs, Scan documents and OCR

Views

145

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

OCR from the same areas on several pages

New Here ,
Oct 03, 2020

Copy link to clipboard

Copied

I am hoping for a solution using the Action wizard only for the following problem. I am pretty new to Acrobat so I am sure there are many options I have not yet considered for my task.

 

I am working with a large series of typewritten forms that were scanned. These forms contain information that should be read semi-automatically. The same information is in the same area on every page and every page has 6 "areas of interest" that contain said information. The rest of the page is different from page to page so OCRing entire pages would create different levels of noise depending on the page. That is why I want to OCR only those areas of interest and get the output as plaintext. (The goal of the data is in Excel so I will try to get the output in there as directly as possible by VBA, although reading from exported files in VBA is possible, too.)


I was able to create an action that lets the user crop every page down to one of the aforementioned areas and then run OCR and output the text automatically. This process will force the user to wait while only one of the areas is processed, and then repeat it once for every area of the form.

 

Ideally, I would want the user to select all fields on one page, this pattern to be applied to every page, and then OCR data to be exported for those fields separately. 

Alternatively, getting coordinate data from a user's selection would also work as I could use them in VBA to automate the cropping process.

For these two strategies, I haven't found appropriate commands in Acrobat yet.

 

Does anyone have an idea about what I can do?

TOPICS
Acrobat SDK and JavaScript, Edit and convert PDFs, Scan documents and OCR

Views

146

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Oct 03, 2020 0
Most Valuable Participant ,
Oct 03, 2020

Copy link to clipboard

Copied

You can't import or export OCR data from one PDF file to another, if that's what you're planning to do. The only way to do that is to replace the non-OCRed page with one that has undergone it.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 03, 2020 0
New Here ,
Oct 04, 2020

Copy link to clipboard

Copied

Not OCR data from one page to another, but just which areas need to be OCR'd in order to execute it just there and export the text from just those areas.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 04, 2020 0
Most Valuable Participant ,
Oct 04, 2020

Copy link to clipboard

Copied

The only way to do that is how you already did it, by cropping the page.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 04, 2020 0
Adobe Community Professional ,
Oct 04, 2020

Copy link to clipboard

Copied

You can create 5 copies of every page. Then crop the pages at the different coordinates. After this OCR the whole document.

Or redact unwanted areas.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 04, 2020 0
Most Valuable Participant ,
Oct 04, 2020

Copy link to clipboard

Copied

And how would you combine those pages back to be a single page?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 04, 2020 0
Adobe Community Professional ,
Oct 04, 2020

Copy link to clipboard

Copied

"And how would you combine those pages back to be a single page?"

 

Why a single page?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 04, 2020 0
Most Valuable Participant ,
Oct 04, 2020

Copy link to clipboard

Copied

I thought that was the intetion... If they're just interested in extracting the text, then you suggestion is fine. Another option is to OCR the entire page and then redact the areas you don't want.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 04, 2020 0
New Here ,
Oct 04, 2020

Copy link to clipboard

Copied

I am indeed interested in only the text. 

 

OCRing the entire page would generate too much noise. Efficiency is essential for the resulting app/workflow that I'm working on.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 04, 2020 0
New Here ,
Oct 04, 2020

Copy link to clipboard

Copied

This does sound interesting, although to do this with several pages, I think I would need to create 5 copies of the whole pdf, let the user crop different fields in the different documents and export with a special filename indicating its content. With the help of JavaScript this could be possible maybe.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Oct 04, 2020 0