Consistent timeout errors from ExportPDF on a large set of different PDF files

Report · Sep 08, 2021

We are using Adobe PDF Services API to convert PDF->DOCX using the ExportPDF API.

Recently we have come across a large number of different files (most of which are coming from the same source) that can never be processed by the service successfully. Every attempt results in a timeout error like the one below (after waiting for about 10 minutes):

Exception encountered while executing operation ServiceApiError: Operation execution has timed out!
    at /......./node_modules/@adobe/pdfservices-node-sdk/src/internal/api/cpf-api.js:58:13
    at new Promise (<anonymous>)
    at /......./node_modules/@adobe/pdfservices-node-sdk/src/internal/api/cpf-api.js:55:11
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5) {
  requestTrackingId: '3eTHHoqqVYBdifdPYD9AexhSbybqSeRB',
  statusCode: 0,
  errorCode: 'UNKNOWN'
}

We weren't able to understand what's wrong with those files. They are textual PDF files of vaious sizes. They can be as small as just 12 pages.

We have tried several other things during investigation:

Used ExportPDF to convert a couple of those files to images - it worked fine.
Used SplitPDF into 1-page PDFs (which worked fine) and ran the first page and the last page of an affected file. Even the first page alone of the file caused the same error. However, the last page was processed successfully (it only had a few lines of text tho).
Used 3rd-party software to try whether it could convert PDF files to DOCX. Aspose library failed with OutOfMemoryException. Foxit just hung and had to be terminated (tried multiple times with the same outcome). Any viewer/editor can open the files without any problems.

Could somebody please help us understand what the issue could be? Ideally it would be great if Adobe PDF Services API supported such files (whatever the issue with them is), but if at least we could come up with some pre-processing to perform on our side before sending them to the API, that'd already be super helpful.

Thank you.

Report · Sep 08, 2021

Can you share a few of those PDFs?

Report · Sep 08, 2021

Yes, I can, but preferably not publicly. Is there a way to share them directly with you?

Report · Sep 08, 2021

Sure thing, you can email me at jedimaster at adobe dot com. When you do, let me know if it is safe to share it with others internally, or if you want it to stay with me alone. I'm about to end my day so no rush.

Report · Sep 09, 2021

I work with Ray and he shared your PDF files with me. These PDFs are seriously broken. I'm going to recommend to our product team that they add a "preflight" step to the API so that it doesn't even try to process files that are unlikely to work.

If you want the gorey details, the files all have unbalanced q and Q operators. This is a very common problem with a lot of non-Adobe PDF creation tools that Acrobat/Reader will compensate for but no API tools that I'm aware of will; Adobe's or others. And while Acrobat can view the file, it can't Export to Word either.

I'm not sure what to suggest at this point. The files are just broken.

Consistent timeout errors from ExportPDF on a large set of different PDF files

Photos