I am trying to identify if a PDF file has undergone an OCR process.
The questioned PDF document is a certificate that is formed of image parts like the signature block, crest, and portions of a border. The text in the document is editable text but contains garbled words, similar to when an OCR process doesn't identify the characters properly. It seems to obvious for it to be fraud but in the cases I receive it is plausible.
Usually if the document is an image, the image undergoes an OCR process. This is easy to identify due the base document is an image. You can see this in "Content" tool, or select the image and download it etc.
Two questions i need to answer:
1. Is it possible a PDF document that is a scanned image that undergoes an OCR process segments the image into portions like signature block, crest .etc, recognises the text and discards most of the segmented images only leaving the signature block, crest and garbled text because it didn't read it correctly?
2. Is there a way of examing the internal structure or internal code to identify if an OCR process has occurred?
Edit and convert PDFs
Scan documents and OCR
Standards and accessibility