• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

OCR Scanning only the first page of multiple documents with action wizard

New Here ,
Mar 13, 2023 Mar 13, 2023

Copy link to clipboard

Copied


Hello,

I have been trying to solve this problem for a couple of weeks with information from the internet and have not been able to.

I have thousands of documents in PDF image format of which I would like to scan only the first page of each one and then save that single page as a separate document renamed with "OCR_" + "original name".

I have found the following javascript code to save only the first page of a document:

this.extractPages(0, 0, this.path.replace(/\.pdf$/i, "_p1.pdf"));

but I can't manage to add the step of scanning only that page before saving it.

 

Could someone help me?

 

Thank you very much in advance.

 

TOPICS
Create PDFs , How to , PDF forms , Standards and accessibility

Views

667

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 14, 2023 Mar 14, 2023

Copy link to clipboard

Copied

Use 2 actions

  1. Extract the first pages
  2. Run OCR on the extracted pages

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 14, 2023 Mar 14, 2023

Copy link to clipboard

Copied

Hi,

 

I don't think you can managet this in the order you want to, I think what you can do though is to extract the page that you want to be OCR'd then run another action to OCR those documents.

because the OCR action does not have any settings to control what gets OCR'd, it is just all or nothing.

BarlaeDC_0-1678797761270.png

 

So the full work flow would be

1 - Run action to extract Page1 to new location

2 - Run action to OCR the new documents.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 14, 2023 Mar 14, 2023

Copy link to clipboard

Copied

Thank you very much for your answer Barlae,

I had already thought about doing it in two steps, the problem is that I will have maybe hundreds of thousands of couments and doing it in two steps could double the processing time, I thought there would be a way to run the ocr scan using javascript code

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 16, 2023 Mar 16, 2023

Copy link to clipboard

Copied

Hi,

The OCR function is not really available from coding at all, despite lots of requests for it.

 

Here is a little bit of a crazy idea.

1. change your flow to delete all but the first page  ( you might want to copy the documents first )

2. OCR the now 1 page document.

 

This action would look something like -

BarlaeDC_0-1678968804953.png

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 21, 2023 Mar 21, 2023

Copy link to clipboard

Copied

 Thank you@BarlaeDC

I think it's a good idea, what would be the JavaScript code to remove all pages except the first one?

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 24, 2023 Mar 24, 2023

Copy link to clipboard

Copied

LATEST

Hi,

 

You should be able to create the JavaScript required from here - https://opensource.adobe.com/dc-acrobat-sdk-docs/library/jsapiref/doc.html#deletepages

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines