PDF Tools API Information About Document Before OCR

New Here ,
May 06, 2021 May 06, 2021

Copy link to clipboard

Copied

I'm using the PDF Tools API to OCR documents.  It doesn't like documents over 100 pages.  I catch the exception split, OCR the split document, and then merge.  Is there a way for me to know the number of pages before submitting the first time?  Or is try/catch the recommended method?

 

I'm also in need of a way of knowing if a document is already 100% embeded text with no need for OCR.  I currently submit everything for OCR.  If a document is already 100% text the API quickly returns the same document.  This is fine, but I've recently been encoutering documents that are processed through DocuSign and the API pukes on these documents because they have digital certificates and signatures.

 

Is there a way I can know through the API if a document is already 100% text and/or know if a document has digital signatures?

 

Views

73

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
May 06, 2021 May 06, 2021

Copy link to clipboard

Copied

LATEST

DocuSign (as well as Adobe Sign as well as any decent esignature tool) will put a "tamper-evident seal" on the PDF which is basically a certifying digital signature and also set security on the file to prevent modification. This is why the API fails. Security settings prevent the PDF from being split.

But you are correct, it would be handy to have a "document properties" API that would tell you some of the particulars about the PDF before submitting it to another service. I've submitted this request to the product team.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines