Skip to main content
Known Participant
January 22, 2024
Question

Is there a way to easily change the SDK such that the table can be returned in polars/pandas format?

  • January 22, 2024
  • 2 replies
  • 2045 views

Hi,

 

 

I am working on a flow, in which we upload pdfs to S3 and then ping a lambda function that will use the Extract API in order to extract the tables from the PDF.

I see that it works well but my request is: can the SDK be changes such that it can return polars/pandas dataframes instead of a ZIP of CSVs?

Or at least give us some control over it ? Like let us submit pull requests ?

I definitely would love this possibility because now I need to run it, unzip, then read all the csvs into dataframes and then do the processing.

And everything seems fragile to me.

This topic has been closed for replies.

2 replies

Raymond Camden
Community Manager
Community Manager
January 22, 2024

No, this is not possible. In my opinion, we have to settle on a few output options that are the most flexible to cover the most use cases, but we won't ever be able to cover every usecase. 

 

As for 'fragile', I'm not sure what you mean. After you get the result from Extract, your code to process the results is... well your code. Build it rock solid and it won't be fragile. 😉

Known Participant
January 23, 2024

Hi. Okay, thank you for the quick response!

My last question:@Raymond Camden

Can I directly use the zipped response from the API in memory? Without writing to a file ? I tried write_steam but I do not manage it

The only way I found a way is, to modify download_and_save_file() and do not create a local file from FileRef, but directly unzip the stream and then open the zip in memory and use the json

Raymond Camden
Community Manager
Community Manager
January 23, 2024

Not knowing what SDK you are using, in Node, there is a writeToStream option. Temporary file storage _is_ used though. If you absolutely need to avoid that, you need to switch to the REST APIs which are relatively easy to use.

Known Participant
January 22, 2024

And another question would be: why does it take for us (UK), about 18-30 seconds for the Adobe API to send us a response ? We would appreciate it if we could lower the processing time.

Raymond Camden
Community Manager
Community Manager
January 22, 2024
Known Participant
January 23, 2024

I changed it to EU.

And we run it in AWS Lambda now and I get 43 seconds for processing time. Which very weird