Extract text from pdf

Question

Requirement:

We receive different PDF forms in native text for the Insurance/ banking / life science domain.

Need to extract the content which are in tabular form could be either with clear border or borderless

Need to extract the content from check box

Need consistent output through extraction for both structured / unstructured data

we need to achieve this through programing in c#.

It would be great if someone faces similar situation or any suggestion plz.

Thanks

Mohan

james123ABC · Answer

Hi Mohan,There are several programming libraries out there which give you easy-to-use PDF data extraction tools. For C# I know about Bytescout​ and Leadtools.​ I'm however not sure if they offer checkbox detection.If you don't want to develop the data extraction yourself, you can check out Docparser​. It's an app which allows you to extract data from PDF files without coding. Once set up, you can import documents and obtain the parsed data with through a HTTP API.Hope that helped!

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded