Skip to main content
Participating Frequently
March 13, 2023
Question

OCR Scanning only the first page of multiple documents with action wizard

  • March 13, 2023
  • 2 replies
  • 2530 views


Hello,

I have been trying to solve this problem for a couple of weeks with information from the internet and have not been able to.

I have thousands of documents in PDF image format of which I would like to scan only the first page of each one and then save that single page as a separate document renamed with "OCR_" + "original name".

I have found the following javascript code to save only the first page of a document:

this.extractPages(0, 0, this.path.replace(/\.pdf$/i, "_p1.pdf"));

but I can't manage to add the step of scanning only that page before saving it.

 

Could someone help me?

 

Thank you very much in advance.

 

This topic has been closed for replies.

2 replies

BarlaeDC
Community Expert
Community Expert
March 14, 2023

Hi,

 

I don't think you can managet this in the order you want to, I think what you can do though is to extract the page that you want to be OCR'd then run another action to OCR those documents.

because the OCR action does not have any settings to control what gets OCR'd, it is just all or nothing.

 

So the full work flow would be

1 - Run action to extract Page1 to new location

2 - Run action to OCR the new documents.

 

Participating Frequently
March 14, 2023

Thank you very much for your answer Barlae,

I had already thought about doing it in two steps, the problem is that I will have maybe hundreds of thousands of couments and doing it in two steps could double the processing time, I thought there would be a way to run the ocr scan using javascript code

BarlaeDC
Community Expert
Community Expert
March 16, 2023

Hi,

The OCR function is not really available from coding at all, despite lots of requests for it.

 

Here is a little bit of a crazy idea.

1. change your flow to delete all but the first page  ( you might want to copy the documents first )

2. OCR the now 1 page document.

 

This action would look something like -

 

Bernd Alheit
Community Expert
Community Expert
March 14, 2023

Use 2 actions

  1. Extract the first pages
  2. Run OCR on the extracted pages