Copy link to clipboard
Copied
Hi,
I am having trouble in comparing PDF files.
First, some details about my OS and Acrobat software:-
- Windows 10 Pro, i5, 16GB RAM, 64-bit
- Adobe Acrobat Pro DC, version 2020.012,20041
I am trying to compare the first and second versions of a legal contract. The only difference between the first and second versions of this contract are some addition / deletion of words / paragraphs. After I ran the comparison, the results only showed: whole page deleted and whole page inserted (which shouldn't be right, as only minor changes were made to the text).
Some information about the first and second versions of the document:-
- First Version: this document was sent to me by e-mail by ABC Ltd. Under properties, it shows that the PDF producer is "Adobe Acrobat Pro DC 19.21.20049". When I open this first version, I am able to select and copy text.
- Second Version: ABC Ltd printed this document, signed it, and delivered the hardcopy to me. I scanned this into my PC using my office scanner (RICOH). Under properties, it shows that the PDF producer is "Adobe PSL 1.3e for Canon". When I scanned this document, I adjusted the settings, so that the scanned document would be OCR. When I open this second version, I am able to select and copy text.
Further, information:-
- AfterI pressed the "Compare Files" button, and selected both Old and New Files, a yellow interrogation mark would appear under the New File, and when I put my mouse cursor over it, the following message would appear: "Selected document is a scanned PDF and contains no text. Acrobat will perform image to image comparison only."
- I have tried printing this Second Version using Acrobat by selecting "Adobe PDF" as the printer. Let's call this print out as "new Second Version. Then, I tried running a comparison between the First Version and the new Second Version, but still have the same problem mentioned above.
Please help. Checking and comparing documents is big part of my daily job, and this is the main reason I purchased Acrobat Pro DC. Thank you very much
Copy link to clipboard
Copied
Apparently, I was approached by a scammer after making the above post, asking me to contact the e-mail below for adobe customer support:-
Adobecare.Experts@protonmail.com
Copy link to clipboard
Copied
What OCR settings does you use?
Copy link to clipboard
Copied
I can't remember. I can check tomorrow to confirm. But I had two options: (1) OCR fast, and (2) OCR precision. I chose OCR precision, and it took about 25 minutes for the scan to complete. The document had 50 pages.
Copy link to clipboard
Copied
Hi Bernd Alheit, I have checked my office scanner. I used the following OCR settings:-
PDF OCR (prioritize precision)
Colour: black & white to grayscale
resolution: 300 x 300 dpi
copy ratio: 100%
Copy link to clipboard
Copied
Good job, to spot the scammer. This message "Selected document is a scanned PDF and contains no text. " seems clear. There IS no text, OCR was not done or AT LEAST ONE of the documents. If you think the message is wrong, please check by trying to select and copy text in BOTH files. OCR must be done, well and correctly, if you want to compare text.
Copy link to clipboard
Copied
Thank you for your reply.
I am able to select and copy texts in both files. I believe the problem is with the New / Second File (which was scanned). But I believe that the scan was done properly. I set OCR to precision mode, and it took 25 minutes to for the scan / OCR to complete.
I have even tried doing (1) enhance scan, and (2) text recoginition on the both files, and tried running the comparison again. But still same problem. The comparison couldn't detect any text.
Copy link to clipboard
Copied
In Acrobat try the OCR option "Editable Text and Images"
Copy link to clipboard
Copied
Hi Bernd Alheit,
I have tried your settings above as well, and it still doesn't work. I cannot share the whole file (as it contain confidential information). But I can share one page of the New File and Old File. Can you try comparing these two files from your side? Thanks.
1. Old File
2. New File
Thank you.
Copy link to clipboard
Copied
Hi,
Could you please try comparing the using option shown below in the screenshot?
Let us know if you're facing the same issue.
Regards
Adobe Acrobat DC Team
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Hi,
Have you tried comparing the files using 'Compare Text only' mode?
Regards
Adobe Acrobat DC Team
Copy link to clipboard
Copied
Update:
After numerous rounds of attemps and hours of research and asking around, I have been getting better comparison results by further adjusting scan settings.
Just one more question: The New File is a scanned document with binding / punched holes, whereas the Old File has no binding / punched holes. So, when I run a comparison between the old and new file, the holes are detected as text changes. This is confusing as there are 21 holes on each page, and there are 50 pages, and the comparison would show there are 1,050 text changes.
Thank you.
Copy link to clipboard
Copied
Hi,
Are you getting the punch holes as differences in 'Compare Text only' mode as well?
Using the 'Filter' option on the Compare app toolbar might help you in this regard in filtering/ignoring the various types of differences.
PS: The better the quality of the scanned file, the better will be the OCR output. And that should improve the quality of results as well.
Regards
Adobe Acrobat DC Team
Copy link to clipboard
Copied
Hi,
Yes, I selected "Compare Text only". The compare function treats the binding / punched holes as letters "I", "L", "t", "i".
By "Filter", do you mean the "Settings" where I check various boxes under "Show in Report"? If that's what you are referring to, I checked the box "text" only. Further, I don't see any filter for ignoring punched holes.
Please advise. Thanks.
Copy link to clipboard
Copied
Hi,
I'm talking about the 'Filter' menu on the Compare toolbar. Please refer the attached screenshot below:
Also, could you please share some sample files with me so that I can investigate further?
Please follow the steps to share the file using Adobe send - https://cloud.acrobat.com/send
Copy link to clipboard
Copied
Was this ever resolved? I am going through the same challenge. I have an original lease document, and a scanned and signed (signed with ink) copy that was sent back to me. I need to compare 45 pages but it just shows every page as deleted or inserted.
Copy link to clipboard
Copied
Yes, it was resolved. For PDF there are different types of comparison. The current comparison is telling you that you have two completely different documents, which is correct. What you want is to compare text, and only text. To do that, the scanned document must be OCR'd. You'll need to check the text after OCR to make sure it's really legible text. Do this by copying and pasting some text into a different document.
Copy link to clipboard
Copied
Hi, I have found a way around. It's not the best solution, but need to wait until Adobe makes further improvement to the software.
For the scanned and signed copy, make sure that you scan the document using the highest resolution possible. For my scanner, the highest was 600*600 dpi. If your scanner has the OCR function, do not use it. Given the large size of the file (as you used highest resolution), the pdf would come out in separate files. So, you need to combine / merge these PDF files first. After merging, run OCR using Adobe. Then, do the comparison. You should be able to compare and find difference. I found that the result was about 95% accurate. If the PDF file has handwritten words / diagrams / pictures, the result accuracy would drop to about 90%. If the PDF scan has punched holes, the result accuracy would drop down dramatically. Very tedious process involving a few extra steps, but better than comparing the whole document manually.
There are actually a few articles (that you can google) discussing in great detail about how (1) scan resolution, and (2) font style of a PDF document would affect the comparison result. It would be nice if Adobe could include these stuffs in the forums / FAQ.
Hope this helps, and would be great if you could share better method !
Copy link to clipboard
Copied
Thanks for the reply, I will give this a try. It always seems worth it just to read the whole document page by page instead based on how much time this takes!