• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers

Unable to correctly extract tables from pdf document using pdf extract api

New Here ,
Oct 19, 2022 Oct 19, 2022

Copy link to clipboard

Copied

Hello Everyone

 

Use Case : I am using the PDF Extract API service to extract the tables within the pdf

Tech Stack : .Net Nuget version is Adobe.PDFServicesSDK :  3.0.0

Problem : In a given table, if all the cells for a particular column are empty, they get merged with the next column, (Both the pdf file and output file is attached)

Expected output : CSV Files

Here is the sample code

 

Adobe.PDFServicesSDK.ExecutionContext executionContext = Adobe.PDFServicesSDK.ExecutionContext.Create(credentials);
                
ExtractPDFOperation extractPdfOperation = ExtractPDFOperation.CreateNew();
FileRef sourceFileRef = FileRef.CreateFromStream(pdfFileStream, "application/pdf");
                    extractPdfOperation.SetInputFile(sourceFileRef);

                    // Build ExtractPDF options and set them into the operation.
                    ExtractPDFOptions extractPdfOptions = ExtractPDFOptions.ExtractPDFOptionsBuilder()
                          .AddElementsToExtract(new List<ExtractElementType>(new[] { ExtractElementType.TABLES }))
                          .AddTableStructureFormat(TableStructureType.CSV)
                          .Build();

                    extractPdfOperation.SetOptions(extractPdfOptions);

                    // Lock & Execute the operation.
                    FileRef resultZipFile = extractPdfOperation.Execute(executionContext);

 

Error CSV : expected are 6 columns but only 5 are being shown in the csv

AnilBDugar_1-1666190867218.png

Pdf File being parsed :

AnilBDugar_2-1666190939408.png

 

Pls help

 

thanks

AD

 

 

TOPICS
.NET SDK , PDF Extract API , PDF Services API

Views

54

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 19, 2022 Oct 19, 2022

Copy link to clipboard

Copied

LATEST

Try parsing the attached file and you will be able to reproduce the issue,

Also note i tried parsing using Amazon Textextract Service and it works !!

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources