Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Extract text from pdf

New Here ,
Mar 05, 2017 Mar 05, 2017

Requirement:

We receive different PDF forms in native text for the Insurance/ banking / life science domain.

Need to extract the content which are in tabular form could be either with clear border or borderless

Need to extract the content from check box

Need consistent output through extraction for both structured / unstructured data

we need to achieve this through programing in c#.

It would be great if someone faces similar situation or any suggestion plz.

Thanks

Mohan

606
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 01, 2017 Jun 01, 2017
LATEST

Hi Mohan,

There are several programming libraries out there which give you easy-to-use PDF data extraction tools. For C# I know about Bytescout​ and Leadtools.​ I'm however not sure if they offer checkbox detection.

If you don't want to develop the data extraction yourself, you can check out Docparser​. It's an app which allows you to extract data from PDF files without coding. Once set up, you can import documents and obtain the parsed data with through a HTTP API.

Hope that helped!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines