Skip to main content
Participating Frequently
July 17, 2024
Question

Which API would be best suited to check if two pdf files are the same?

  • July 17, 2024
  • 1 reply
  • 505 views

I have a use case in which multiple PDF files are uploaded. I want to make sure no duplicate files get uploaded based on all the documents that have already been uploaded. What would be the best way to achieve this? Thank you.

    This topic has been closed for replies.

    1 reply

    Joel Geraci
    Community Expert
    Community Expert
    July 18, 2024

    We don't have an API for that. However, the safest way to detect duplicate PDF files is to convert each page to a high resolution image then compare the images page by page. There are many tools that will help you do that. 

    Participating Frequently
    July 22, 2024

    That sounds quite computationally expensive. Are there any other approaches that you might recommend? Also, for the above approach, any tools that. you recommend? Thank you @Joel Geraci 

     

    Participating Frequently
    July 29, 2024

    Just wanted to check in again @Joel Geraci. Thanks