Incorrect contentAnalyzerRequests json values = 400 status "invalid JSON in request"
Copy link to clipboard
Copied
Working in Postman and going off of the Postman Collection for that particular service, the problem arises for both HTML to PDF request and Extract PDF requests. I have valid auth bearer token and nothing else seems amiss.
My contentAnalyzerRequests value for HTML to PDF:
{\n \"cpf:inputs\": {\n \"params\": {\n \"cpf:inline\": {\n \"print\": {\n \"includeHeaderFooter\": true\n },\n \"pageLayout\": {\n \"pageHeight\": 8.5,\n \"pageWidth\": 11\n },\n \"json\": \"{\\\"k1\\\": \\\"v1\\\", \\\"k2\\\": \\\"v2\\\"}\"\n }\n },\n \"documentIn\": {\n \"cpf:location\": \"InputFile0\",\n \"dc:format\": \"application/zip\"\n }\n },\n \"cpf:engine\": {\n \"repo:assetId\": \"urn:aaid:cpf:Service-e2ee120a2b06427cb449592f5db967e7\"\n },\n \"cpf:outputs\": {\n \"documentOut\": {\n \"cpf:location\": \"multipartLabelOut\",\n \"dc:format\": \"application/pdf\"\n }\n }\n}
My contentAnalyzerRequests value for Extract PDF:
{\n \"cpf:engine\": {\n \"repo:assetId\": \"urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695\"\n },\n \"cpf:inputs\": {\n \"documentIn\": {\n \"cpf:location\": \"InputFile0\",\n \"dc:format\": \"application/pdf\"\n },\n \"params\": {\n \"cpf:inline\": {\n \"elementsToExtract\": [\n \"text\", \"tables\"\n ],\n \"renditionsToExtract\": [ \"tables\", \"figures\"]\n }\n }\n },\n \"cpf:outputs\": {\n \"elementsInfo\": {\n \"cpf:location\": \"jsonoutput\",\n \"dc:format\": \"application/json\"\n },\n \"elementsRenditions\": {\n \"cpf:location\": \"fileoutpart\",\n \"dc:format\": \"text/directory\"\n }\n }\n}
It could be that I don't understand what I'm doing with regard to these values. Do they need to be adjusted based on whatever PDF is being fed to it?
Copy link to clipboard
Copied
Any chance you can share those code snippets in either a Gist, or, use the Insert Code button when editing your message and then paste it in. That should keep the line breaks and remove the \n.
Copy link to clipboard
Copied
Unfortunately not in this case. I used the code sample feature when plugging the JSON in, but there isn't a JSON option so I went with HTML. I have uploaded the JSON files, however, for each API service in question. It's there that I copy-pasted the values associated with each contentAnalyzerRequests.
Copy link to clipboard
Copied
Ah, so the JSON above is the defintion for Postman. When you generate the error, Postman should (I believe!) give you access to the JSON body it sent to us. Can you dig that out?
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Sorry I meant the Request portion. I believe Postman should show you, for the failed request, the exact JSON it sent to our API. Not the Response, but what it sent.
Copy link to clipboard
Copied
In Postman, go to Body, look at contentAnalyzerRequests, the value is probably shrunk so double click to open. That should be it.
Copy link to clipboard
Copied
Yeah I'm simply not showing that for the Body section of the response (or any other response section). There isn't a contentAnalyzerRequests in the body, just the JSON that I posted
Copy link to clipboard
Copied
Copy link to clipboard
Copied
I honestly don't know what to say. If you go to the Body tab, do you see *anything*? Here's what I see.
Copy link to clipboard
Copied
I think I see the source of our confusion. Here is what I have as the value. I had mentioned this earlier but this was taken straight from the Extract PDF postman collection. This is what I input as part of my request. I get nothing back as a response save the error message in JSON.
Copy link to clipboard
Copied
which I'm grabbing from here
Copy link to clipboard
Copied
Interesting. Notice how your value has line breaks and escape chars in. It should not. In my screen shot above I didn't show the right thing (sorry!), but look at mine:
Let me grab the latest Extract Postman collection to see if I can reproduce.
Copy link to clipboard
Copied
So I imported Extract.json, the version you shared above. In Body, in contentAnalyzerRequests, when I look at VALUE, I do *not* see the escaped stuff, etc. How did you add this to Postman? For me, I used Import.
Copy link to clipboard
Copied
I literally copy pasted that value taken from the Extract collection into the value for contentAnalyzerRequests in the Postman body, but I see what you mean about the line breaks. Those line breaks are built into the Extract collection.
I tried importing from Postman but the schema I input provided in the Extract collection can't be read by Postman. Finally, I just copied the value into Notepad++ and removed all the instances of \n and \ to get this value:
{"cpf:engine":{"repo:assetId":"urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695"},"cpf:inputs":{"documentIn":{"cpf:location":"InputFile0","dc:format":"application/pdf"},"params":{"cpf:inline":{"elementsToExtract":["text","tables"],"renditionsToExtract":["tables","figures"]}}},"cpf:outputs":{"elementsInfo":{"cpf:location":"jsonoutput","dc:format":"application/json"},"elementsRenditions":{"cpf:location":"fileoutpart","dc:format":"text/directory"}}}
And while I do get a different kind of error, it's still 400:
{
"cpf:status": {
"completed": true,
"type": "I/O Validation error.",
"title": "Required multipart field InputFile0 not found in the request.",
"status": 400,
"report": "{\"error_code\":\"MULTIPART_FIELD_MISSING\"}"
},
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695"
},
"cpf:inputs": {
"documentIn": {
"cpf:location": "InputFile0",
"dc:format": "application/pdf"
},
"params": {
"cpf:inline": {
"elementsToExtract": [
"text",
"tables"
],
"renditionsToExtract": [
"tables",
"figures"
]
}
}
}
}
So, some field perhaps is missing from the Notepad++ hitjob I did?
Copy link to clipboard
Copied
2 things: "I tried importing from Postman but the schema I input provided in the Extract collection can't be read by Postman." That's odd. Any chance you are running an older version of Postman? Any chance when you downloaded it, the browser messed with it?
Second issue is much simpler - you need to pick a file for InputFile0. See it there as the second option?
You need to actually pick a file.
Copy link to clipboard
Copied
I used the schema link:
And I have always had a PDF file as the inputFile0. Here is the result:
Attached is the actual pdf for reference
Copy link to clipboard
Copied
For item one - no. Use the downloaded JSON file (the one you shared above), in Postman, Import, and pick the file.
I had no problem submitting the call to Extract using the PDF you attached below.
Honestly it sounds like the biggest issue is how you are importing the collection into Postman. Can you try the file import? You then need to run the Exchange JWT portion, get your bearer token, and use that (and your client id) in the "Submit" call.
Copy link to clipboard
Copied
Heyyy we're almost there
Got a 202 "In Progress". Followed your instructions by importing the JSON file itself and not just the schema link, got this:
{
"cpf:status": {
"completed": false,
"type": "",
"title": "In Progress",
"status": 202
},
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695"
},
"cpf:inputs": {
"documentIn": {
"cpf:location": "InputFile0",
"dc:format": "application/pdf"
},
"params": {
"cpf:inline": {
"elementsToExtract": [
"text",
"tables"
],
"renditionsToExtract": [
"tables",
"figures"
]
}
}
}
}
Copy link to clipboard
Copied
Cool! So this is as expected. The API supports letting your code poll for an update. However, the Postman collection uses a header, Prefer, that has a value, wait=0, which means to wait for the result. In Postman if you look at the response headers, you'll see a location header. If you make an authenticated call to that value, you can get your bits.

