PDF Tools API Information About Document Before OCR

Question

I'm using the PDF Tools API to OCR documents. It doesn't like documents over 100 pages. I catch the exception split, OCR the split document, and then merge. Is there a way for me to know the number of pages before submitting the first time? Or is try/catch the recommended method?

I'm also in need of a way of knowing if a document is already 100% embeded text with no need for OCR. I currently submit everything for OCR. If a document is already 100% text the API quickly returns the same document. This is fine, but I've recently been encoutering documents that are processed through DocuSign and the API pukes on these documents because they have digital certificates and signatures.

Is there a way I can know through the API if a document is already 100% text and/or know if a document has digital signatures?

Joel Geraci · Answer

DocuSign (as well as Adobe Sign as well as any decent esignature tool) will put a "tamper-evident seal" on the PDF which is basically a certifying digital signature and also set security on the file to prevent modification. This is why the API fails. Security settings prevent the PDF from being split.

But you are correct, it would be handy to have a "document properties" API that would tell you some of the particulars about the PDF before submitting it to another service. I've submitted this request to the product team.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded