• Global community
    • Language:
      • Deutsch
      • English
      • EspaƱol
      • FranƧais
      • PortuguĆŖs
  • ę—„ęœ¬čŖžć‚³ćƒŸćƒ„ćƒ‹ćƒ†ć‚£
    Dedicated community for Japanese speakers
  • ķ•œźµ­ ģ»¤ė®¤ė‹ˆķ‹°
    Dedicated community for Korean speakers
Exit
0

convert response from PDF Extract API to image and csv files

New Here ,
Jun 12, 2021 Jun 12, 2021

Copy link to clipboard

Copied

Hello All,

 

I am using the PDF Extract API by making a REST call using postman. The call is successful and i am able to Poll and get the contentAnalyserResponse in text format. How do i get this as separate files that can get downloaded to my disk as images and/or csv/json files for text and tables. Can i receive the response as object(s) that can be written to different files with the proper extensions to achieve this. I remember reading somewhere but now cannot locate it again that the response can be received as a single downloadable zip file with all the separate files in a fixed folder structure...

 

If anyone has tried or achieved this, please could you share how you have done it.

 

Thanks and regards,

Adi

TOPICS
PDF Extract API

Views

1.5K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jun 14, 2021 Jun 14, 2021

Copy link to clipboard

Copied

It should be a multipart form response. I haven't done that myself with Postman, but does that give you some help?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 07, 2021 Aug 07, 2021

Copy link to clipboard

Copied

Hi, 

I'm at the same spot right now and could use some guidance.

From looking at the json, it appears that I'm getting good results, but now I need to get it displayed properly. 

I'm beginning to think that I'm unable to render the response in postman and need to install the sdk to visualize the extraction. I'm hoping to send the GET responses directly into a db to display on website.

Any input on next steps would be greatly appreciated.

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Aug 09, 2021 Aug 09, 2021

Copy link to clipboard

Copied

If you are getting the JSON, then you are good - I mean, as far as I can help. How you use the JSON depends on what your building. But at that point, it's outside the API/SDK and in your hands in terms of what you do with the JSON. Right?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 09, 2021 Aug 09, 2021

Copy link to clipboard

Copied

Thanks for the response!

I suspect I'm missing something simple.

I'm getting alot of black triangles with white question marks. 

A character encoding issue? My source PDF is in Icelandic btw. It seems that alot of the response code is usable and correct, but I don't know how to get around this... 

Short example below

"cpf:outputs":{"elementsRenditions":[{"cpf:location":"fileoutpart0","dc:format":"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"},{"cpf:location":"fileoutpart1","dc:format":"image/png"}],"elementsInfo":{"cpf:location":"jsonoutput","dc:format":"application/json"}}}
--Boundary_86985_669621865_1628280852738
Content-Type: application/octet-stream
Content-Disposition: form-data; name="fileoutpart1"

ļæ½PNG


IHDRLQAļæ½ļæ½IDATxļæ½ļæ½ļæ½ļæ½qļæ½Ųŗ`|ļæ½$ "ļæ½%ļæ½ ļæ½Bļæ½Cļæ½Dhļæ½c9ļæ½ļæ½ļæ½:ļæ½Nļæ½É€[eļæ½Uļæ½ļæ½ļæ½ooļæ½ļæ½gļæ½ļæ½-biļæ½ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½~?~ļæ½ļæ½ļæ½ļæ½ļæ½
ļæ½@`ļæ½@`ļæ½@`ļæ½@`ļæ½@`ļæ½@`ļæ½@`ļæ½@`ļæ½@`ļæ½@`ļæ½ļæ½9ļæ½4Mļæ½ļæ½ļæ½ļæ½?ļæ½ļæ½ļæ½-ļæ½ļæ½opwļæ½tß³ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½y>ļæ½ļæ½ļæ½8ļæ½ļæ½ ļæ½,ļæ½ļæ½ļæ½_ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½nWļæ½pļæ½e:ļæ½ļæ½axļæ½=ļæ½]ļæ½X,:ļæ½?ļæ½$Iļæ½8ļæ½ļæ½q~;uļæ½ļæ½ļæ½G`ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½eļæ½Eļæ½vļæ½ļæ½seļæ½ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½tļæ½Zļæ½ļæ½ ļæ½ļæ½ļæ½dļæ½ļæ½ļæ½ļæ½2ļæ½ļæ½ļæ½jÕæļæ½ļæ½ļæ½ļæ½ļæ½ļæ½yļæ½ļæ½yļæ½Wļæ½#ļæ½8ļæ½ļæ½0ļæ½Nļæ½?ļæ½ļæ½ē Ž}?7Ū’ļæ½gļæ½ļæ½ļæ½lFļæ½Qw/ļæ½eļæ½ļæ½ļæ½ļæ½zhļæ½m4ļæ½ļæ½ļæ½*ļæ½ļæ½yļæ½ļæ½Jļæ½t8ļæ½fļæ½ļæ½bQOļæ½ļæ½>ļæ½ļæ½^__ļæ½4=ļæ½wļæ½ļæ½h4ziļæ½ļæ½f'wļæ½ļæ½Ł¬ļæ½ļæ½ļæ½dļæ½|ļæ½Ķ„7=[ļæ½ļæ½ļæ½ļæ½vYļæ½Mzļæ½ļæ½rļæ½ļæ½7iv6ļæ½]ܽļæ½ļæ½pļæ½ļæ½ļæ½ļæ½	ļæ½:ļæ½	ļæ½ļæ½kļæ½(ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½sļæ½ļæ½nļæ½'ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½jļæ½:ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½Wļæ½ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½gļæ½hd<wļæ½Mļæ½ļæ½ļæ½ļæ½iļæ½ļæ½ļæ½ļæ½ļæ½klļæ½:ļæ½~-_ļæ½	ļæ½}rļæ½ļæ½ļæ½ļæ½$ɳ~Gx\ļæ½ļæ½ļæ½sļæ½ļæ½Jļæ½×”Sļæ½ļæ½mļæ½ļæ½dļæ½<ļæ½[ļæ½ļæ½ļæ½kļæ½ļæ½ļæ½Ö½Kļæ½ļæ½*ļæ½ļæ½ļæ½ļæ½ļæ½ļæ½`Bļæ½Tļæ½K.ļæ½ļæ½ļæ½ļæ½uļæ½MĶ—7?ȕļæ½ļæ½ļæ½_Ė·ļæ½ļæ½Ö®ļæ½ļæ½ļæ½bļæ½ļæ½ļæ½jļæ½ļæ½Aļæ½ļæ½ļæ½Aļæ½}ļæ½ļæ½ļæ½ļæ½Ļ“'Eļæ½~ļæ½(ļæ½ļæ½,+ɲļæ½9-=ļæ½ļæ½ļæ½ļæ½ļæ½Zwļæ½ļæ½nļæ½ļæ½ļæ½f>ļæ½oļæ½Ūƒļæ½ļæ½ļæ½ļæ½ļæ½Ųļæ½Aļæ½y>ļæ½L.@ļæ½ļæ½ļæ½ļæ½ļæ½:ļæ½ļæ½tļæ½ļæ½N9k1Dӕ/ļæ½&.;ļæ½ļæ½+[.ļæ½ÉµWļæ½ļæ½cļæ½ļæ½ļæ½

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Aug 09, 2021 Aug 09, 2021

Copy link to clipboard

Copied

Ah, so Extract can return the images in the PDF (along with tabular data too), so I think you are seeing Postman maybe fail to try to render that. Normally you would take the multipart form response and save out the binary files and the JSON. The SDK does all of this for and makes it much easier, so if you *do* have access to that, I'd suggest it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 13, 2021 Aug 13, 2021

Copy link to clipboard

Copied

Hi Raymond,

 

Normally you would take the multipart form response and save out the binary files and the JSON. The SDK does all of this for and makes it much easier - In the SDK, there is no such option for .Net, then we we can save out the binary files & JSON using REST API?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 14, 2021 Aug 14, 2021

Copy link to clipboard

Copied

Did you look for a MIME solution for decoding MIME multipart data in .NET? The REST API is delivering the data, all you need to do is decode it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 16, 2021 Aug 16, 2021

Copy link to clipboard

Copied

Hi Raymond,

 

Thanks for your update, Now i am able to decode the MIMe multipart data and save the files. I have a quick question. Currently i am using the below mentioned "contentAnalyzerRequests" :

 

{
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695"
},
"cpf:inputs": {
"documentIn": {
"cpf:location": "InputFile0",
"dc:format": "application/pdf"
},
"params": {
"cpf:inline": {
"elementsToExtract": ["text", "tables"],
"renditionsToExtract": [ "tables", "figures"],
"tableOutputFormat": "csv"
}
}
},
"cpf:outputs": {
"elementsInfo": {
"cpf:location": "jsonoutput",
"dc:format": "application/json"
},
"elementsRenditions": {
"cpf:location": "fileoutpart",
"dc:format": "text/directory"
}
}
}

 

When i am decoding the files, i am getting as individual files instead of zip files. What are changes i have to do to get zip file in contentAnalyzerRequests.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Aug 17, 2021 Aug 17, 2021

Copy link to clipboard

Copied

I believe we return a zip with the SDK and with the REST API  you have the data as is - individual files. I'm more familiar with the SDKs but I'm mostly sure I'm right. If for some reason you *want* a zip file, like you want to move the results elsewhere, you could zip it after the operation.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 17, 2021 Aug 17, 2021

Copy link to clipboard

Copied

LATEST

Hi Raymond,

 

Thanks for the update.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 09, 2021 Aug 09, 2021

Copy link to clipboard

Copied

The embedded data is clearly a PNG (says so on the first line). A MIME extractor should be used to get out the separate parts. I have no idea what would be in the PNG.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources