Hello, I appreciate your inquiry.
While I may not have all the details, I noted your mention of a feature in your Brother scanner that enables an OCR step for each document scanned.
I suggest reaching out to the manufacturer directly to explore how to access the scripting commands that could facilitate this process in bulk. Could you please clarify which specific OCR step and Brother scanning device you are referring to?
It would be helpful to have more precise information.
From my understanding, Adobe Acrobat Pro alone does not support the functionality you are seeking. Unfortunately, it lacks the capability to create a batch sequence for analyzing thousands of PDFs simultaneously, and you may encounter limitations based on your operating system as well.
However, you can utilize Acrobat's Print Production tool by navigating to Preflight and then Options ===>>> "Browse the internal structure of all document fonts".
Or you may also taylor the Action Wizard to create an automated action: Open the Action Wizard Tool => selecte New Action => add then choose Preflight tool => Next, choose a folder where you placed the PDFs to be analyzed => click on Select Folder => Save ==> add an Action Name (i.e. "Analyze Font Structure on PDFs") => click Save.
This Preflight tool feature will help you determine if a scanned document contains embedded fonts, which is common when OCR has been applied; if OCR wasn't applied it will show nothing.
Please note that this process must be done individually in Acrobat, as bulk processing is not an option. That said, if you search online, you might discover third-party tools and Python scripts that can perform the bulk operations you need.
But what you are asking for, may demand simplicity rather than complicating ourselves too much; for instance, I have a straightforward batch script that can be utilized on Windows machines, and I was able to test it with 600 PDF documents, processing them in under 30 seconds.
The script carries out several functions.
It designates a source folder located on your Desktop, assuming for this example that you have already established a folder named 'SCANNED_PDFS' there and manually moved the desired PDFs that you would like to process. Upon execution, the script creates a subfolder titled 'OCRed_PDFs'.
Subsequently, it looks for the 'FontDescriptor' text string within the PDF structure of documents; if such documents were processed by an OCR tool it will identify the 'FontDescriptor' string in them and move those PDFs to the 'OCRed_PDFs" subfolder .
If a document is merely a scanned PDF it will not be moved, since it lacks the FontDescriptor object. This will allow you to keep PDFs that were OCR'ed on a separate folder while the files that are just scans remain intact in their source folder for further OCR processing (which you can do with Acrobat using the Scan&OCR Tool or with a third-party command-line batch script.
NOTE:
I am not very savvy with advanced batch scripting, so this script will only move PDFs files from one parent folder to a subfolder as long as the PDF file names doesn't include spaces. If the file names include space, the scrcipt will not process them and ignore them.
For further clarification on what this script achieves, please refer to the comparison presented in my slides below, where two PDFs are analyzed using a text editor (Also note that I am not evaluating Accessibility, Tag structure or any kind of XMP Metadata, and much less following any kind of PDF specifications according to ISO standards), focusing solely on text strings:
Here is a copy of the script:
@ ECHO OFF
set "source=C:\Users\UserName\Desktop\SCANNED_PDFS" cd "%source%"
mkdir C:\Users\UserName\Desktop\SCANNED_PDFS\OCRed_PDFs
for /f %%A in ('findstr /M "FontDescriptor" *.pdf') DO MOVE "%%A" "C:\Users\UserName\Desktop\SCANNED_PDFS\OCRed_PDFs"
... View more