• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

How to find duplicates on size and contents only?

Engaged ,
Feb 25, 2017 Feb 25, 2017

Copy link to clipboard

Copied

I have multiple duplicate PDF files. They definitely are the same, contents-wise.

Acrobat compare files shows no differences. File sizes are the same, dates (not times) are the same.

But they have different hashes (there is technical reason why this, but not very much relevant)

So, sizes, contents, metadata and dates are the same.

The usual duplicate files finder tools, they take hashes into account and therefore they are of no use.

Acrobat can compare 1 file with another one.

Regretfully I have not found a way where files within an entire folder is being compared, based on above criteria,

thus marking files as (possible) duplicates.

Are there any solution for this?

Except, of course, going thru the entire folder and manually delete duplicates.

=

TOPICS
Acrobat SDK and JavaScript , Windows

Views

3.8K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 26, 2017 Feb 26, 2017

Copy link to clipboard

Copied

There's no way to do this using Adobe Acrobat. Files that are visually identical can be structured very differently. If your only concern is duplicate visual content, my recommendation is to render the PDF pages to images and compare those and then delete the corresponding PDF.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Feb 26, 2017 Feb 26, 2017

Copy link to clipboard

Copied

Thanks.

I forgot to saye that the files were downloaded from a server. Simple 2 page documents,

As a matter of fact, I might download the very same document 10x in a row, but each time the hash is different.

1. Within Acrobat->View->Compare documents: Acrobat tells me the documents are the same.

2. A 3rd party tool that compares on contents, tells me the files are the same.

Imagine one is downloading the acrobat-xi-pro-accessibility-best-practice-guide.pdf twice.

http://www.adobe.com/content/dam/Adobe/en/accessibility/products/acrobat/pdfs/acrobat-xi-pro-accessi...

In that case the hashes are definitely the same.

However, once the document is generated 'on the fly' the hashes are different, i.e. it is the  way  how a document is produced that makes them different.

The reason for posting here is that I am unaware of Acrobat-tools that allow comparing documents on contents only

(number of lines, number of characters,, or size and number of lines) but ignore comparing on hashes.

Except then by comparing 2 documents within Acrobat, which, in case of many documents, is quite a workload.

I know it does not exist, but it would be nice if Acrobat could do a kind of 'batch-compare files' in a folder in the same way as document compare.

=

later:

hm... I see I am not the only one..

Can we have a batch process for comparing pdfs  

Can we have a batch process for comparing pdfs ()

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 27, 2017 Feb 27, 2017

Copy link to clipboard

Copied

I found yur post very confusing until I realised you are using the word "hash" in a completely new way, different from anyone else... especially confusing as PDF files do use hashes, for signatures. Hash function - Wikipedia

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Feb 27, 2017 Feb 27, 2017

Copy link to clipboard

Copied

I am very sorry for the confusion.

What I meant is that the hashes calculated from the files are different.

(using tools like 'HashMyFiles', from Nirsoft,

or MD5 & SHA-1 Checksum Utility, from MD5 & SHA Checksum Utility | Raymond's WordPress )

again, sorry for the confusion.

Below is what I meant...

==

SnagIt-26022017 085756.png

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 27, 2017 Feb 27, 2017

Copy link to clipboard

Copied

Yes, these hashes are different, but these are not used by the compare tool. It is certain that making the same file twice will give a different hash, because each PDF file MUST contain a unique string that is different from all other files (the "document ID") as well as the date and time it was made.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Feb 27, 2017 Feb 27, 2017

Copy link to clipboard

Copied

I believe it is the  method  of downloading causing different checksums.

As said, if the file is not generated 'on the fly',  then the checksums are the same.

=

SnagIt-27022017 105537.png

=

I know that checksums are not considered within Acrobat View->Compare Documents.

This method is perfect for comparing 2/two documents only, but not when comparing a lot of probably duplicates.

Ideal would be if this method (Compare Documents) could be applied on an entire folder and Acrobat would show what files are duplicates.

Anyway, this is not possible right now and there is no alternative but to go thru the folder(s), sort on size and check and delete each duplicate file.

=

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 29, 2018 Jan 29, 2018

Copy link to clipboard

Copied

LATEST

Try Duplicate Files Deleter to find and/or delete them.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines