Incorrect contentAnalyzerRequests json values = 400 status "invalid JSON in request"

Report · Dec 15, 2021

Working in Postman and going off of the Postman Collection for that particular service, the problem arises for both HTML to PDF request and Extract PDF requests. I have valid auth bearer token and nothing else seems amiss.

My contentAnalyzerRequests value for HTML to PDF:

{\n  \"cpf:inputs\": {\n    \"params\": {\n      \"cpf:inline\": {\n        \"print\": {\n          \"includeHeaderFooter\": true\n        },\n        \"pageLayout\": {\n          \"pageHeight\": 8.5,\n          \"pageWidth\": 11\n        },\n        \"json\": \"{\\\"k1\\\": \\\"v1\\\", \\\"k2\\\": \\\"v2\\\"}\"\n      }\n    },\n    \"documentIn\": {\n      \"cpf:location\": \"InputFile0\",\n      \"dc:format\": \"application/zip\"\n    }\n  },\n  \"cpf:engine\": {\n    \"repo:assetId\": \"urn:aaid:cpf:Service-e2ee120a2b06427cb449592f5db967e7\"\n  },\n  \"cpf:outputs\": {\n    \"documentOut\": {\n      \"cpf:location\": \"multipartLabelOut\",\n      \"dc:format\": \"application/pdf\"\n    }\n  }\n}

My contentAnalyzerRequests value for Extract PDF:

{\n    \"cpf:engine\": {\n        \"repo:assetId\": \"urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695\"\n    },\n    \"cpf:inputs\": {\n        \"documentIn\": {\n            \"cpf:location\": \"InputFile0\",\n            \"dc:format\": \"application/pdf\"\n        },\n        \"params\": {\n            \"cpf:inline\": {\n                \"elementsToExtract\": [\n                    \"text\", \"tables\"\n                ],\n                \"renditionsToExtract\": [ \"tables\", \"figures\"]\n            }\n        }\n    },\n    \"cpf:outputs\": {\n        \"elementsInfo\": {\n            \"cpf:location\": \"jsonoutput\",\n            \"dc:format\": \"application/json\"\n        },\n        \"elementsRenditions\": {\n            \"cpf:location\": \"fileoutpart\",\n            \"dc:format\": \"text/directory\"\n        }\n    }\n}

It could be that I don't understand what I'm doing with regard to these values. Do they need to be adjusted based on whatever PDF is being fed to it?

Report · Dec 15, 2021

Any chance you can share those code snippets in either a Gist, or, use the Insert Code button when editing your message and then paste it in. That should keep the line breaks and remove the \n.

Report · Dec 15, 2021

Unfortunately not in this case. I used the code sample feature when plugging the JSON in, but there isn't a JSON option so I went with HTML. I have uploaded the JSON files, however, for each API service in question. It's there that I copy-pasted the values associated with each contentAnalyzerRequests.

Report · Dec 15, 2021

Ah, so the JSON above is the defintion for Postman. When you generate the error, Postman should (I believe!) give you access to the JSON body it sent to us. Can you dig that out?

Report · Dec 15, 2021

{

"cpf:status": {

"completed": true,

"type": "Invalid JSON in request",

"title": "Internal exception occur while fetching response error",

"status": 400,

"report": "{\"error_code\":\"INVALID_REQUEST\"}"

}

This is what I got for the body of the request but I don't think that's what you're asking for.

Report · Dec 15, 2021

Sorry I meant the Request portion. I believe Postman should show you, for the failed request, the exact JSON it sent to our API. Not the Response, but what it sent.

Report · Dec 15, 2021

In Postman, go to Body, look at contentAnalyzerRequests, the value is probably shrunk so double click to open. That should be it.

Report · Dec 15, 2021

Yeah I'm simply not showing that for the Body section of the response (or any other response section). There isn't a contentAnalyzerRequests in the body, just the JSON that I posted

Report · Dec 15, 2021

Report · Dec 16, 2021

I honestly don't know what to say. If you go to the Body tab, do you see *anything*? Here's what I see.

Report · Dec 16, 2021

I think I see the source of our confusion. Here is what I have as the value. I had mentioned this earlier but this was taken straight from the Extract PDF postman collection. This is what I input as part of my request. I get nothing back as a response save the error message in JSON.

Report · Dec 16, 2021

which I'm grabbing from here

Report · Dec 16, 2021

Interesting. Notice how your value has line breaks and escape chars in. It should not. In my screen shot above I didn't show the right thing (sorry!), but look at mine:

Let me grab the latest Extract Postman collection to see if I can reproduce.

Report · Dec 16, 2021

So I imported Extract.json, the version you shared above. In Body, in contentAnalyzerRequests, when I look at VALUE, I do *not* see the escaped stuff, etc. How did you add this to Postman? For me, I used Import.

Report · Dec 16, 2021

I literally copy pasted that value taken from the Extract collection into the value for contentAnalyzerRequests in the Postman body, but I see what you mean about the line breaks. Those line breaks are built into the Extract collection.

I tried importing from Postman but the schema I input provided in the Extract collection can't be read by Postman. Finally, I just copied the value into Notepad++ and removed all the instances of \n and \ to get this value:

{"cpf:engine":{"repo:assetId":"urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695"},"cpf:inputs":{"documentIn":{"cpf:location":"InputFile0","dc:format":"application/pdf"},"params":{"cpf:inline":{"elementsToExtract":["text","tables"],"renditionsToExtract":["tables","figures"]}}},"cpf:outputs":{"elementsInfo":{"cpf:location":"jsonoutput","dc:format":"application/json"},"elementsRenditions":{"cpf:location":"fileoutpart","dc:format":"text/directory"}}}

And while I do get a different kind of error, it's still 400:

{
    "cpf:status": {
        "completed": true,
        "type": "I/O Validation error.",
        "title": "Required multipart field InputFile0 not found in the request.",
        "status": 400,
        "report": "{\"error_code\":\"MULTIPART_FIELD_MISSING\"}"
    },
    "cpf:engine": {
        "repo:assetId": "urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695"
    },
    "cpf:inputs": {
        "documentIn": {
            "cpf:location": "InputFile0",
            "dc:format": "application/pdf"
        },
        "params": {
            "cpf:inline": {
                "elementsToExtract": [
                    "text",
                    "tables"
                ],
                "renditionsToExtract": [
                    "tables",
                    "figures"
                ]
            }
        }
    }
}

So, some field perhaps is missing from the Notepad++ hitjob I did?

Report · Dec 16, 2021

2 things: "I tried importing from Postman but the schema I input provided in the Extract collection can't be read by Postman." That's odd. Any chance you are running an older version of Postman? Any chance when you downloaded it, the browser messed with it?

Second issue is much simpler - you need to pick a file for InputFile0. See it there as the second option?

You need to actually pick a file.

Report · Dec 16, 2021

I used the schema link:

https://schema.getpostman.com/json/collection/v2.1.0/collection.json

which is found in the Extract.json Postman collection and got this result:

And I have always had a PDF file as the inputFile0. Here is the result:

Attached is the actual pdf for reference

Report · Dec 16, 2021

For item one - no. Use the downloaded JSON file (the one you shared above), in Postman, Import, and pick the file.

I had no problem submitting the call to Extract using the PDF you attached below.

Honestly it sounds like the biggest issue is how you are importing the collection into Postman. Can you try the file import? You then need to run the Exchange JWT portion, get your bearer token, and use that (and your client id) in the "Submit" call.

Report · Dec 16, 2021

Heyyy we're almost there

Got a 202 "In Progress". Followed your instructions by importing the JSON file itself and not just the schema link, got this:

{
    "cpf:status": {
        "completed": false,
        "type": "",
        "title": "In Progress",
        "status": 202
    },
    "cpf:engine": {
        "repo:assetId": "urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695"
    },
    "cpf:inputs": {
        "documentIn": {
            "cpf:location": "InputFile0",
            "dc:format": "application/pdf"
        },
        "params": {
            "cpf:inline": {
                "elementsToExtract": [
                    "text",
                    "tables"
                ],
                "renditionsToExtract": [
                    "tables",
                    "figures"
                ]
            }
        }
    }
}

Report · Dec 16, 2021

Cool! So this is as expected. The API supports letting your code poll for an update. However, the Postman collection uses a header, Prefer, that has a value, wait=0, which means to wait for the result. In Postman if you look at the response headers, you'll see a location header. If you make an authenticated call to that value, you can get your bits.

Incorrect contentAnalyzerRequests json values = 400 status "invalid JSON in request"

Photos