Doc Gen API generating new PDF that will only open in Preview and Safari but not Chrome or Adobe

Report · Jul 11, 2022

After the newly generated PDF is created from my DOCX template and JSON data, the file will throw and error if trying to open in Chrome that says:

"Error. Failed to load PDF document." as well as "Adobe Acrobat Reader could not open 'test.pdf' because it is either not a supported file type or because the file has been damaged (for example, it was sent as an email attachment and wasn't correctly decoded)." if trying to open in Adobe Reader.

However, when opening in Safari or the native Mac Preview app, the file will open.

We are using the Document Generation API (https://documentcloud.adobe.com/document-services/index.html#post-documentGeneration) to pass the template file and JSON data. I have tried using our own DOCX file as well as the template DOCX that is given on the Playgound site. They generate with 201 status code.

Please help!

Report · Jul 12, 2022

Can you share both the Word template and the JSON?

Report · Jul 12, 2022

Yes! Please see the files below - the PDF file won't let me attach (likely because it's corrupted), but I am going to post a public URL with it

Report · Jul 12, 2022

Here is the link to the file: https://drive.google.com/file/d/1F7KaifCaz7bNULU3EL4jroi6JMHujGAX/view?usp=sharing

Report · Jul 12, 2022

Ok - Now I see what's going on. The returned object is multipart form data, not PDF. The PDF is in the multipart form data. See image below. You'll need to parse the response to get the PDF.

Report · Jul 12, 2022

Yes, this is the reason and I'm inclined to think this is an example of a poor API design on the Adobe part. Returning multipart response is not the best way in this case, as well as the actual endpoint is about the status of the job. So if it's complete, why not return a CDN result location or have an extra API endpoint to get the result by job ID? Not easy to integrate with at all in its current shape. Just an observation.

Report · Jul 20, 2022

I'm not sure what specifically needs to be parsed out to return only the PDF - is there any documentation or anything to point to? Interesting that the API has you specify an output type but does not truly return that type without passing the response.

Report · Jul 20, 2022

It's a fairly trivial task to parse a multipart body and depends on your existing solution/codebase/infrastructure tech stack and requirements for what to do with the resulting file. I had to build a quick prototype in NodeJS to prove we could use this API and it works as expected. What's misleading is not the output type, as it's getting returned as a part of the response, but the statement about API being easy to integrate with. Multipart responses are extremely rare and are not convenient to deal with, one of the issues is that "octet-stream" is too generic and does not define the exact binary format hence the solution becoming quite bespoke. Note - there could be more parts and the solution needs to be generic enough to deal with it.

Report · Jul 20, 2022

Do you have any examples you can share of the NodeJS solution? Or any
repositories that have an example?

We are so far down the path now of leveraging this API we can’t really turn
back.

Anything you can share or point me towards is both greatly appreciated and
helpful!
--

Cheers,

Report · Jul 20, 2022

There's an option to go with their SDK https://www.npmjs.com/package/@adobe/pdfservices-node-sdk which I did not try because documentation is rather nonexistent and I had to do it quickly. Note, there are also packages for Java, Python and .Net. SDK might give you the actual result file as output, but I did not try it because my implementation can not use SDK due to the tech stack.

I did straightforward API which I find flexible enough as all that has to be done is generate JWT and compose the payload.

So with either generate or polling endpoint it's a simple POST or GET call respectively, this is just an indicative code I compiled as an example from a couple of test scripts I built, so you can work off it.

const multipart = require('./multipart.js');

const headers = {
    'Authorization': `Bearer ${jwt}`,
    // 'Accept': 'application/json, text/plain, */*', // need for generate call
    'x-api-key': `${api_key}`,
    // 'Prefer': 'respond-async, wait=0', // need for generate call
};

fetch(url, { // switch for generate or poll API endpoints
  method: 'GET', // POST for generate
  headers,
  // body: requestJsonPayload // used for generate call
})
.then(res => {
    if (res.ok) {
        return res;
    } else {
        console.error('Result status:', res.status);
        process.exit();
    }
})
.then(async res => {
    const boundary =  multipart.getBoundary(res.headers);
    const parts = multipart.parse(await res.arrayBuffer(), boundary);

    for (let i = 0; i < parts.length; i++) {
        const part = parts[i];
        writeFileSync(`./${part.name}`, part.data); // one of the parts is PDF file
    }
})
.catch(error => {
    console.error('Error:', error);
    process.exit();
});

multipart is actually a forked library from this one: (https://github.com/nachomazzara/parse-multipart-data) that someone fixed to work properly: (https://github.com/simhnna/parse-multipart-data/tree/do-not-assume-header-order)

Report · Jul 20, 2022

Happy to help more with guiding your solution through, just drop a message.

Report · Jul 26, 2022

Would you be willing to contract with us to help implement a solution for this?

We've been searching for help and still can't quite get it figured out.

Salesforce is our tech stack and using Apex to make the request, but we could leverage another platform like GCP or AWS if we need to process the response before sending back to SFDC.

Let me know your thoughts and thank you!

Report · Jul 27, 2022

Oh yeah, I can see your problem, especially in the SF environment. The way PDF Services API works it is not easy to implement, and it doesn't end up with just an API call as it's generally asynchronous so has to be carefully handled with some sort of a queue. Happy to help, drop me a message at < former.token at gmail.com >

Cheers