Copy link to clipboard
Copied
I work with thousands of documents for court cases and need to be able to search for and find all those that have been fully Redacted. What does Adobe do to indicate a file was redacted aside from renaming it with 'Redacted" at the end, which some individuals turn off, and how can we search for that property? JavaScript seems to only find those Marked for Redaction, not fully redacted.
Copy link to clipboard
Copied
So far as I know, a redacted PDF just has the changes made, and the original info deleted. There is no way to identify it, or search for redactions. Some people, indeed, may need to run workflows where the action of redaction cannot be detected.
Copy link to clipboard
Copied
The only thing I can think of is to search for files with large rectangular areas that are completely black.
This is of course very problematic, as files can contain large black images without being redacted, or the redaction can have another color, etc., but it's the only way to do it as there's no "this file has been redacted" tag or anything like that.
Copy link to clipboard
Copied
Thank you, I expect you would know as it is your code I've used to find Marked, and fully Redacted files together, but how does one search for completely black rectangles, without having to open every file? Is that possible?
Copy link to clipboard
Copied
This won't be possible using a script. Possibly with a plugin, or a stand-alone tool.
The latter can process files without displaying them (of course they have to be opened, at least at the memory level, to read their contents, though).
Copy link to clipboard
Copied
I am just working on to detect black rectangles to find redacted files, using Python + OpenCV.
I'm still working on but i think it will work as i hope
Copy link to clipboard
Copied
What does you mean with "fully Redacted" ?
Copy link to clipboard
Copied
Redactions made to text using the PDF Redact tool that have been "Applied". Not simply marked for, which creates the red-box where one can still see the text to be redacted.
Copy link to clipboard
Copied
You need to reverse your process.
Since it's impossible to detect an absence, you should look for a presence.
In other words, look for documents that have not been redacted.
I assume they must contain recurring information.
Copy link to clipboard
Copied
That only works if you know what was redacted in the first place...