Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Which API would be best suited to check if two pdf files are the same?

New Here ,
Jul 17, 2024 Jul 17, 2024

I have a use case in which multiple PDF files are uploaded. I want to make sure no duplicate files get uploaded based on all the documents that have already been uploaded. What would be the best way to achieve this? Thank you.

453
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 18, 2024 Jul 18, 2024

We don't have an API for that. However, the safest way to detect duplicate PDF files is to convert each page to a high resolution image then compare the images page by page. There are many tools that will help you do that. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jul 22, 2024 Jul 22, 2024

That sounds quite computationally expensive. Are there any other approaches that you might recommend? Also, for the above approach, any tools that. you recommend? Thank you @Joel Geraci 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jul 29, 2024 Jul 29, 2024
LATEST

Just wanted to check in again @Joel Geraci. Thanks

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources