Copy link to clipboard
Copied
Hi Community,
I am trying to identify if a PDF file has undergone an OCR process.
Scenario:
The questioned PDF document is a certificate that is formed of image parts like the signature block, crest, and portions of a border. The text in the document is editable text but contains garbled words, similar to when an OCR process doesn't identify the characters properly. It seems to obvious for it to be fraud but in the cases I receive it is plausible.
Usually if the document is an image, the image undergoes an OCR process. This is easy to identify due the base document is an image. You can see this in "Content" tool, or select the image and download it etc.
Two questions i need to answer:
1. Is it possible a PDF document that is a scanned image that undergoes an OCR process segments the image into portions like signature block, crest .etc, recognises the text and discards most of the segmented images only leaving the signature block, crest and garbled text because it didn't read it correctly?
2. Is there a way of examing the internal structure or internal code to identify if an OCR process has occurred?
Copy link to clipboard
Copied
Hello @Ben_FDE_2022,
There's couple of places you can look to see if a scanned document has been edited.
1. File > Properties > Description > Additional Metadata > Advanced > XMP Media Management Properties > xmpMM:History...
2. Edit > Preflight > Options > Browse Internal PDF Structure...
Hope this helps!
Regards,
Mike
Find more inspiration, events, and resources on the new Adobe Community
Explore Now