I have a large pdf library - 13k+ files, that I Indexed with Acrobat.
1 pdf per (title folder) with 1-5 (title folders) per (author folder).
Most of these files have been OCRed already. Some are not. How do I run some type of action to identify which files have not been OCRed, without having to manually open each one.
Acrobat Catalog Index log only logs (extracting), but doesn't specify if the file was All Image (unrecognized text).
I can't find a way to create an action to run through all of the files, only recognizing the ones that have not been recognized yet.
I think you can just use an Action with the Recognize Text command and run it on all your files. It will skip any files that already have "real" text in them.
Copy link to clipboard
"only recognizing the ones that have not been recognized yet."
It is easier to do the opposite: detect already OCRized documents, but it's the same thing.
You can use the "Invisible text (text rendering mode 3)" Check to create a Profile, then use this Profile in an Action to sort the files.
This will require running two Actions, though. My solution only requires one.
Plus, you'll need to manually copy the files back to the original folder at the end.
Several steps can be added in a Profile or in an Action, but you're right I misunderstood that it was only necessary to sort them out.