Copy link to clipboard
Copied
I'm just starting work with the PDF services API in my python project, and I'd successfully extracting elements from a PDF, but only a portion of the extracted tables are in CSV files (the rest being screenshots). Are there limitations to extracting tables into CSV files and the screenshots are a backup, or how can I try to get exclusively CSVs out of my PDF so I can work with that data?
Copy link to clipboard
Copied
Posting my code for PDF Extract Options in case that's helpful
# Build ExtractPDF options and set them into the operation
extract_pdf_options: ExtractPDFOptions = ExtractPDFOptions.builder() \
.with_elements_to_extract([ExtractElementType.TEXT, ExtractElementType.TABLES]) \
.with_element_to_extract_renditions(ExtractRenditionsElementType.TABLES) \
.with_elements_to_extract_renditions([ExtractRenditionsElementType.TABLES,
ExtractRenditionsElementType.FIGURES]) \
.with_table_structure_format(TableStructureType.CSV) \
.build()
extract_pdf_operation.set_options(extract_pdf_options)
# Execute the operation.
result: FileRef = extract_pdf_operation.execute(execution_context)