Skip to main content
Participant
March 6, 2017
Question

Extract text from pdf

  • March 6, 2017
  • 1 reply
  • 675 views

Requirement:

We receive different PDF forms in native text for the Insurance/ banking / life science domain.

Need to extract the content which are in tabular form could be either with clear border or borderless

Need to extract the content from check box

Need consistent output through extraction for both structured / unstructured data

we need to achieve this through programing in c#.

It would be great if someone faces similar situation or any suggestion plz.

Thanks

Mohan

This topic has been closed for replies.

1 reply

james123ABC
Participant
June 1, 2017

Hi Mohan,

There are several programming libraries out there which give you easy-to-use PDF data extraction tools. For C# I know about Bytescout​ and Leadtools.​ I'm however not sure if they offer checkbox detection.

If you don't want to develop the data extraction yourself, you can check out Docparser​. It's an app which allows you to extract data from PDF files without coding. Once set up, you can import documents and obtain the parsed data with through a HTTP API.

Hope that helped!