Copy link to clipboard
Copied
I am using the pdf extraction function provided here on my own PDF files: https://developer.adobe.com/document-services/docs/overview/pdf-extract-api/howtos/extract-api/
Whenever I try and extract tables, the program throws this error:
INFO:adobe.pdfservices.operation.pdfops.extract_pdf_operation:All validations successfully done. Beginning ExtractPDF operation execution
ERROR:adobe.pdfservices.operation.internal.api.cpf_api:Failed in parsing Extract Result
Traceback (most recent call last):
File "C:\Users\camer\anaconda3\lib\site-packages\adobe\pdfservices\operation\internal\service\extract_pdf_api.py", line 52, in download_and_save
extract_data_parser.parse()
File "C:\Users\camer\anaconda3\lib\site-packages\adobe\pdfservices\operation\internal\service\extract_data_parser.py", line 180, in parse
self.ed_zipper.add_rendition_data(rendition_output)
File "C:\Users\camer\anaconda3\lib\site-packages\adobe\pdfservices\operation\internal\service\extract_data_zipper.py", line 28, in add_rendition_data
file_name = rdata.file_name + rdata.rendition_extension
TypeError: can only concatenate str (not "NoneType") to str
ERROR:root:Exception encountered while executing operation
Traceback (most recent call last):
File "C:\Users\camer\anaconda3\lib\site-packages\adobe\pdfservices\operation\internal\service\extract_pdf_api.py", line 52, in download_and_save
extract_data_parser.parse()
File "C:\Users\camer\anaconda3\lib\site-packages\adobe\pdfservices\operation\internal\service\extract_data_parser.py", line 180, in parse
self.ed_zipper.add_rendition_data(rendition_output)
File "C:\Users\camer\anaconda3\lib\site-packages\adobe\pdfservices\operation\internal\service\extract_data_zipper.py", line 28, in add_rendition_data
file_name = rdata.file_name + rdata.rendition_extension
TypeError: can only concatenate str (not "NoneType") to str
Extracting just text appears to work fine, but as soon as I try and throw tables into the mix it breaks it. This is an issue as I am pretty much only using this API for its table extraction. Does anyone know a fix for this?
I found a solution! Turns out you need to specify the TableStructureType by adding .with_table_structure_format(TableStructureType.###) to your ExtractPDFOptions
Copy link to clipboard
Copied
I found a solution! Turns out you need to specify the TableStructureType by adding .with_table_structure_format(TableStructureType.###) to your ExtractPDFOptions