Is there any Adobe's API/Library for object level extraction of text and table content from PDF?

Forum|Forum|7 years ago
October 26, 2018
1 reply
942 views

I need to extract text content (within paragraph or table) from a PDF. The extraction should be in the form of objects so that i can get the exact content from PDF. So, Am looking for API/library that can be integrated with my application. Suggest me ..

Thanks for your support.

Regards,

Ajay

This topic has been closed for replies.

T

Test Screen Name

Legend

There are no paragraphs or tables in most PDFs unless they are tagged. But you can get the actual PDF graphical objects. You you read the PDF Reference?

Doyou want this solution forbend users who ha e licensed Acrobat Pro only?

Z

Zealous_Explorer0101

Participant

I want to extract tables from pdf as json objects using OCR capability of adobe sdk. Is extraction of tables supported???

T

Test Screen Name

Legend

There are no tables in PDF. Only text (with positions) and lines (with shapes). Advanced C++ programmers can write plug-ins to get this info. After that, it is entirely guesswork whether you have a table. (Exception: tagged files; this adds a meta layer describing tables but extraction is tough).

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded