Copy link to clipboard
Copied
I have multiple duplicate PDF files. They definitely are the same, contents-wise.
Acrobat compare files shows no differences. File sizes are the same, dates (not times) are the same.
But they have different hashes (there is technical reason why this, but not very much relevant)
So, sizes, contents, metadata and dates are the same.
The usual duplicate files finder tools, they take hashes into account and therefore they are of no use.
Acrobat can compare 1 file with another one.
Regretfully I have not found a way where files within an entire folder is being compared, based on above criteria,
thus marking files as (possible) duplicates.
Are there any solution for this?
Except, of course, going thru the entire folder and manually delete duplicates.
=
Copy link to clipboard
Copied
There's no way to do this using Adobe Acrobat. Files that are visually identical can be structured very differently. If your only concern is duplicate visual content, my recommendation is to render the PDF pages to images and compare those and then delete the corresponding PDF.
Copy link to clipboard
Copied
Thanks.
I forgot to saye that the files were downloaded from a server. Simple 2 page documents,
As a matter of fact, I might download the very same document 10x in a row, but each time the hash is different.
1. Within Acrobat->View->Compare documents: Acrobat tells me the documents are the same.
2. A 3rd party tool that compares on contents, tells me the files are the same.
Imagine one is downloading the acrobat-xi-pro-accessibility-best-practice-guide.pdf twice.
In that case the hashes are definitely the same.
However, once the document is generated 'on the fly' the hashes are different, i.e. it is the way how a document is produced that makes them different.
The reason for posting here is that I am unaware of Acrobat-tools that allow comparing documents on contents only
(number of lines, number of characters,, or size and number of lines) but ignore comparing on hashes.
Except then by comparing 2 documents within Acrobat, which, in case of many documents, is quite a workload.
I know it does not exist, but it would be nice if Acrobat could do a kind of 'batch-compare files' in a folder in the same way as document compare.
=
later:
hm... I see I am not the only one..
Copy link to clipboard
Copied
I found yur post very confusing until I realised you are using the word "hash" in a completely new way, different from anyone else... especially confusing as PDF files do use hashes, for signatures. Hash function - Wikipedia
Copy link to clipboard
Copied
I am very sorry for the confusion.
What I meant is that the hashes calculated from the files are different.
(using tools like 'HashMyFiles', from Nirsoft,
or MD5 & SHA-1 Checksum Utility, from MD5 & SHA Checksum Utility | Raymond's WordPress )
again, sorry for the confusion.
Below is what I meant...
==
Copy link to clipboard
Copied
Yes, these hashes are different, but these are not used by the compare tool. It is certain that making the same file twice will give a different hash, because each PDF file MUST contain a unique string that is different from all other files (the "document ID") as well as the date and time it was made.
Copy link to clipboard
Copied
I believe it is the method of downloading causing different checksums.
As said, if the file is not generated 'on the fly', then the checksums are the same.
=
=
I know that checksums are not considered within Acrobat View->Compare Documents.
This method is perfect for comparing 2/two documents only, but not when comparing a lot of probably duplicates.
Ideal would be if this method (Compare Documents) could be applied on an entire folder and Acrobat would show what files are duplicates.
Anyway, this is not possible right now and there is no alternative but to go thru the folder(s), sort on size and check and delete each duplicate file.
=
Copy link to clipboard
Copied
Try Duplicate Files Deleter to find and/or delete them.