Copy link to clipboard
Copied
Hello,
I have around 5000 pdf files in various folders/subfolders; most of them are OCRed already, but some are not.
The thing is when I use the OCR tool on my root folder, it will also OCR the files that are already OCRed, which consume a lot of time and resources unnecessarily.
So my question is: How could I OCR only the files which are not OCRed already, without having to check manually?
Many thanks in advance!
Copy link to clipboard
Copied
1. Sort files using a Preflight profile in an Action that place them in two folders (Success or Error).
Search for "mode 3" in Preflight, this Check characterizes OCRized files, and embed it in a custom Profile (an Action can only use Profiles, not a Check directly).
2. Use an Action to OCRize those that are not.
Copy link to clipboard
Copied
PS: an Action can't move files into the Success/Error folder, it has to copy them, but this isn't a real problem.
Copy link to clipboard
Copied
Hi JR,
thank you very much for your response.
So indeed, I created the profile and action as you suggested. It created two subfolders, one with OCRized files, and one with those that are not-OCRized yet.
But now, how can I OCRized those in the non-OCRized folder? and also, after this process, how can I move these files back to the original folder (by overwriting the non-OCRized ones)?
Also, I have many many folders and subfolders; pdf files are organized as follows, with each Subfolder_x cointaining a various number of PDF files:
Root_Folder\Folder_1\Subfolder_1
Root_Folder\Folder_1\Subfolder_2
Root_Folder\Folder_1\Subfolder_3
etc.
Root_Folder\Folder_2\Subfolder_1
Root_Folder\Folder_2\Subfolder_2
Root_Folder\Folder_2\Subfolder_3
Root_Folder\Folder_2\Subfolder_4
Root_Folder\Folder_2\Subfolder_5
etc.
Root_Folder\Folder_t\Subfolder_1
...
Root_Folder\Folder_t\Subfolder_n
How can I run an action on the Root_Folder so that all Folder_x and Subfolders are processed accordingly and then that processed/OCRized files remains located on the same subfolders as before?
Copy link to clipboard
Copied
Like this:
Copy link to clipboard
Copied
Hi JR,
thanks but I don't understand how it will solve my issue. If I run this action simply like this, it runs the OCR tool on all my PDF files, even those which are already OCRized.
I want to OCR only those files which are not OCRized yet without having to move back files manually to their original folder.
I guess I have a mixture of the first "sort" action you suggested before and the "OCR action", that would look something like that:
1/ detect non OCRized-files
2/ OCR those files which were detected
3/ move back files which were detected to their original folder
But how can we move files to their original folder?
Copy link to clipboard
Copied
Sorry, I misunderstood your previous post.
You cannot do that since Profiles and Actions doesn't support conditions (if/else).
You need an Action that uses a JavaScript script.
Copy link to clipboard
Copied
Thanks JR. But how can I do that?
What kind of script should I use? I never did that before. Is there some code posted somewhere I could use? What Javascript should check for?
Copy link to clipboard
Copied
I hope that another expert better qualified than me in JavaScript can answer you quickly, otherwise I'll do some research.
Copy link to clipboard
Copied
Hello,
I did some research and found some guys but they were using ghostcript, xpdf, xpdvviewer or script done in Applescript here (https://forum.latenightsw.com/t/how-to-detect-whether-a-pdf-has-been-ocrd/1708/5) and I don't think this is usable in Javascript right?
However, there is perhaps a solution with that script using javascript here:
But how do I use Javascript with Action in Adobe?
Copy link to clipboard
Copied
You can't OCR with a script.
I would use an external tool to sort and split the files, and move the ones that needs OCRing to another folder.
Then run an Action in Acrobat to OCR just the files in that folder.
Then use another stand-alone tool to move those files back to their original locations.