Copy link to clipboard
Copied
I have two PDF files and contents are partly overlapping, i.e. (quite a) number of pages are both PDF files.
I might combine both files into a 3rd one.
But, from that point on, how would I need to proceed to have duplicate pages removed?
Is there some way of sorting, so duplicates are 'grouped' can be deleted in a more convenient way, rather than scrolling thru entire PDF back and forth.
These are fairly large (150-200 pages) PDF files from scanned documents.
Acrobat Pro 2020
Thanks.
Detecting duplicates is a major challenge. Can't see a simple way.
ALSO, they won't actually be duplicates; the scanning might be 0.1 mm different and rotated by 0.5 degree, which would make the contents of the page absolutely different (as a graphic).
Copy link to clipboard
Copied
Detecting duplicates is a major challenge. Can't see a simple way.
ALSO, they won't actually be duplicates; the scanning might be 0.1 mm different and rotated by 0.5 degree, which would make the contents of the page absolutely different (as a graphic).
Copy link to clipboard
Copied
Thank you.
Bad luck then. I was already afraid for this.
It was a 'long shot'. I was hoping that, based on OCR/Page recognition, keywords in pages, pages could be indexed and sorted based on some sort of similarity.
Anyway, thanks again.
Copy link to clipboard
Copied
If the results of the OCR are relatively good then it should be possible.
Can you share a sample file with us?