Batch conversion of PDF's to machine readable PDF

Question

I am trying to convert 10000 PDF files to machine readable form using OCR in adobe pro. But some of the PDF's have renderable data and it is failing to convert them into machine readable form. I have seen a solution to convert the PDF into .tiff file and run the OCR and make it into a PDF. This worked but I cannot do the same thing for each file to convert all these 10000 PDF files. Is there any other way that I can do a batch processing on all the 10000 files together by running an action wizard or something like that?

gary_sc · Accepted Answer

Hi @gary_sc ,Appreciate your help but I have done this method. So my concern is whenever in a pdf file a page contains editable text it is not converting that page into machine readable format instead it is converting it into a blank page removing all the text and images. I need a work around for that issue. I couldn't go to each pdf document and verify if it converted everything accurately or not. I wanted to create some sort of automated workflow that would convert all the documents accurately.OK, now I better understand.  Have you tried flattening the document before OCR-ing the document? https://www.ca4.uscourts.gov/caseinformationefiling/efiling_cm-ecf/technical-information/flatten-pdf-fillable-form

gary_sc · Answer

Hi @BG2022

Not being able to view the quality, nature, or resolution of the scans, it's almost impossible for me to dive too deeply into your issue.

However, I one time tried to OCR a 950 page book that had already been PDFed. About 1/2 way through, Acrobat locked up. Admittedly, that was about 15 years ago, and I've got a lot more ram, CPU power, and Acrobat is also newer.

Converting to TIF format was good as it will save you several steps, primarily when you dump a bunch of TIF images into Acrobat, Acrobat will ask if you want them all into one document or to save them as separate documents. Then it will automatically go ahead and then OCR the PDF.

What I would suggest is to pull out 300-400 pages and see how well that works. If it does fine, add another 100 pages. All good; add 100 more. At some point, you're going to go, Hmm, better go back 100 pages and leave it at that.

But let me warn you, while Acrobat is doing OCR, you're computer is essentially closed down. You may think you can look at your email, but as soon as "that" page is done, Acrobat will jump in front and say, "Page completed; I'm doing the next page!!!"

So, plan on doing other things, long coffee breaks, lunch, dinner, whatever.

Good luck!

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded