Highlighted

Exclude images and extract table content in PDF to xml (or csv)

New Here ,
Aug 19, 2020

Copy link to clipboard

Copied

Hi Team,

 

One of my client is having full Adobe Acrobat Pro DC license, requirement is to extract table content from PDF by excluding the images and other content programatically using C#.net.

 

I could not found the required information. Highly appreciated your support in providing the understanding on how can I achieve this.

 

Thanks in advance.

 

Views

53

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Exclude images and extract table content in PDF to xml (or csv)

New Here ,
Aug 19, 2020

Copy link to clipboard

Copied

Hi Team,

 

One of my client is having full Adobe Acrobat Pro DC license, requirement is to extract table content from PDF by excluding the images and other content programatically using C#.net.

 

I could not found the required information. Highly appreciated your support in providing the understanding on how can I achieve this.

 

Thanks in advance.

 

Views

54

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Aug 19, 2020 0
New Here ,
Aug 23, 2020

Copy link to clipboard

Copied

Can someone help me out on this. How can i achieve my above requirement? What kind of license do we need to go for?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 23, 2020 0
Most Valuable Participant ,
Aug 24, 2020

Copy link to clipboard

Copied

There's no such thing as "table content" in a PDF file. There's text, and there are images, graphics and other elements.

If the text is arranged in a "table" it's only because it was placed in such a way on the page, with the graphical elements around it. You can extract it as text, but then you will need to use a complex algorithm to convert it to a table. There are some libraries that (attemp to) do it for you, but I don't know if they exist for C#.Net...

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 24, 2020 0
New Here ,
Aug 24, 2020

Copy link to clipboard

Copied

Thanks for the response. You mentioned that there are some libraries available in other than c# can you please share the information.

Can Adobe Acrobat Pro DC license supports in extracting the pdf content to XML/excel?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 24, 2020 0
try67 LATEST
Most Valuable Participant ,
Aug 24, 2020

Copy link to clipboard

Copied

For example, there are some Java libraries, such as Tabula and TrapRange (both based on PDFBox).

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 24, 2020 0