Skip to main content
Inspiring
February 19, 2022
Answered

Combining PDFs and sorting similar pages

  • February 19, 2022
  • 1 reply
  • 736 views

I have two PDF files and contents are partly overlapping, i.e. (quite a) number of pages are both PDF files.
I might combine both files into a 3rd one.
But, from that point on, how would I need to proceed to have duplicate pages removed?
Is there some way of sorting, so duplicates are 'grouped' can be deleted in a more convenient way, rather than scrolling thru entire PDF back and forth.

These are fairly large (150-200 pages) PDF files from scanned documents.
Acrobat Pro 2020

 

Thanks.

This topic has been closed for replies.
Correct answer Test Screen Name

Detecting duplicates is a major challenge. Can't see a simple way.

ALSO, they won't actually be duplicates; the scanning might be 0.1 mm different and rotated by 0.5 degree, which would make the contents of the page absolutely different (as a graphic).

1 reply

Test Screen NameCorrect answer
Legend
February 19, 2022

Detecting duplicates is a major challenge. Can't see a simple way.

ALSO, they won't actually be duplicates; the scanning might be 0.1 mm different and rotated by 0.5 degree, which would make the contents of the page absolutely different (as a graphic).

adwul62Author
Inspiring
February 20, 2022

Thank you.

Bad luck then. I was already afraid for this.
It was a 'long shot'.  I was hoping that, based on OCR/Page recognition, keywords in pages, pages could be indexed and sorted based on some sort of similarity.

Anyway, thanks again.

 

try67
Community Expert
Community Expert
February 20, 2022

If the results of the OCR are relatively good then it should be possible.

Can you share a sample file with us?