File not suitable for content extraction: File contents are too complex for content extraction
Dear Community,
I am going crazy. I am trying to extract a few PDF files and I get the error: "File not suitable for content extraction: File contents are too complex for content extraction" for some files. Unfortunately, this error doesn't help me at all. The files are not large (9 MB), there are only 4 pages and they look exactly the same as all the others I am working on. unfortunately I am not allowed to share the PDF files. Could someone please give me a hint as to WHAT is too complex?
Best regards
Tommy
my code:
try:
#Initial setup, create credentials instance.
credentials = Credentials.service_principal_credentials_builder().with_client_id('XXX').with_client_secret('XXX').build()
#Create an ExecutionContext using credentials and create a new operation instance.
execution_context = ExecutionContext.create(credentials)
extract_pdf_operation = ExtractPDFOperation.create_new()
#Set operation input from a source file.
source = FileRef.create_from_local_file("XXXX6bfcd6f.pdf")
extract_pdf_operation.set_input(source)
#Build ExtractPDF options and set them into the operation
extract_pdf_options: ExtractPDFOptions = ExtractPDFOptions.builder() \
.with_element_to_extract(ExtractElementType.TEXT) \
.with_include_styling_info(True) \
.build()
extract_pdf_operation.set_options(extract_pdf_options)
#Execute the operation.
result: FileRef = extract_pdf_operation.execute(execution_context)
#Save the result to the specified location.
result.save_as("ExtractTextInfoWithStylingInfoFromPDF.zip")
except (ServiceApiException, ServiceUsageException, SdkException):
logging.exception("Exception encountered while executing operation")
