Unreliable text extraction with Python SDK

Question

Hello,I have been working with the api for text extraction with Python. I am using it both with not-well-structured tables and with very well defined documents as the ones of formularies of the public administration, pretty similiar to tables. I decided to work on the json files generated by the apis. Even with the very well defined ones I encounter two main problems:- the api fails to spot 100% of the 'cells', ocassionaly joining two or more 'cells' into a single one ('Text' field of the json doc)- the errors are not consistent, the output of processing the same document several times is not exactly the same; even worst: extracting text from the same document several times seems to increase the quantity of errors in the json output and their dimension, even joining the contents of a whole page in a single text cell of the json doc. I would need to understand the reason for the increase in errors and how to avoid the most serious ones, I need reliability and I have learned to deal with the fussion of 2 cells, but I need to be sure that the errors will not go further than that.Thanks,Pablo

Test Screen Name · Answer

The Acrobat SDK doesn't have a Python API. Which API are you using?

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded