Copy link to clipboard
Copied
For one of my clients it is very important to remove hidden text. Because there might be information hidden that could lead to personal information. It is possible to clean a pdf, but then it turns the pdf into a photo-pdf. So you would need to do OCR (acrobat capture) afterwards to improve readability.
It would be very nice if there was a preflight to be made that deletes all text elements that are not visible (outside of a masked area of having no color or white as a fill).
Does anyone know if Acrobat Pro has a feature to do that? I cannot find it with the preflight options.
You can make invisible text when you place a pdf onto a page in InDesign, and partly crop the pdf. Export into a new PDF, and the hidden text is still there. Cleanup would get rid of it, but would convert the file to less readable images. Hidden text is very nice in a situation with a scanned paper, where the scanner has an OCR function after scanning. And you wouldn't want to get rid of that hidden text.
Copy link to clipboard
Copied
This has been possible in Acronat Pro for a while though I have not really tried it out myself - please check out this help article (scroll towards the bottom and look for "Find and remove hidden content"):
Removing sensitive content from PDFs in Adobe Acrobat DC
Olaf
Copy link to clipboard
Copied
Hi Olaf, thanks for your reply.
The redaction option is the problem, actually. It is not precise enough. It removes all kinds of hidden text at once (doesn't allow specific choices) and after that it converts the PDF into an image. It would be a huge improvement if you could specify that only text that is hidden by a mask could be removed.
When you scan a paper into PDF and have an OCR function applied to it, it also has hidden text. But in this case it helps to make the content searchable: then it's Okay to leave it. There the redaction tools help you to erase text that was visible on paper AND was added as invisible text in the OCR layer.
If you would continue the redaction process and delete all hidden text, you would lose your OCR-text again. You don't want to do that either, because then your PDF wouldn't be searchable at all anymore.
Copy link to clipboard
Copied
Found it! 🙂
I just managed to compose a preflight action that removes objects that are outside the current clipping area. This doesn't remove the invisible ocr text: that text has no color, is on top of the image of the page, and is NOT inside a clipping area. The text (as well as other objects) that is found outside any clipping area and therefore invisible will be removed.
Copy link to clipboard
Copied
Dude. That was exactly what I was looking for. Thanks a lot!