• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Getting an "Unable to extract content. Internal error" when trying to extract text from a pdf.

New Here ,
Jan 18, 2023 Jan 18, 2023

Copy link to clipboard

Copied

Hello!

I am trying to use the Adobe PDF Extract API on a certain PDF in order to extract its text. I cannot attach the pdf here for security reasons. I have tried with multiple other PDF files and it works fine. The error is pasted below:

 

NFO:adobe.pdfservices.operation.pdfops.extract_pdf_operation:All validations successfully done. Beginning ExtractPDF operation execution
ERROR:root:Exception encountered while executing operation
Traceback (most recent call last):
File "\PDFServicesSDK-Python (Extract)Samples\adobe-dc-pdf-services-sdk-extract-python-samples\src\extractpdf\extract_txt_from_pdf.py", line 48, in <module>
result: FileRef = extract_pdf_operation.execute(execution_context)
File "C:\Python310\lib\site-packages\adobe\pdfservices\operation\pdfops\extract_pdf_operation.py", line 150, in execute
raise se
File "C:\Python310\lib\site-packages\adobe\pdfservices\operation\pdfops\extract_pdf_operation.py", line 140, in execute
download_uri = ExtractPDFService.extract_pdf(execution_context, self._source_file_ref, self.get_options(),
File "C:\Python310\lib\site-packages\adobe\pdfservices\operation\internal\service\extract_pdf_service.py", line 44, in extract_pdf
raise e
File "C:\Python310\lib\site-packages\adobe\pdfservices\operation\internal\service\extract_pdf_service.py", line 38, in extract_pdf
status_poll_response = PlatformApi.status_poll(context, location, x_request_id)
File "C:\Python310\lib\site-packages\adobe\pdfservices\operation\internal\api\platform_api.py", line 80, in status_poll
response = polling2.poll(
File "C:\Python310\lib\site-packages\polling2.py", line 201, in poll
if check_success(val):
File "C:\Python310\lib\site-packages\adobe\pdfservices\operation\internal\api\platform_api.py", line 64, in is_correct_response
raise ServiceApiException(job_error_response.get('message'), ResponseUtil.
adobe.pdfservices.operation.exception.exceptions.ServiceApiException: description =ERROR - Unable to extract content. Internal error; requestTrackingId=4deed61b-5d05-4748-bfa3-737810c47075; statusCode=500; errorCode=ERROR

 

One thing I have noticed while doing this is that, when I open this particular PDF file in Acrobat, I get the following error, but after closing the error I can use the PDF file just fine in Acrobat:

 

The font 'OPSUFont0' contains bad /Widths.

 

Any help with this issue would be greatly appreciated.

TOPICS
Bug , PDF Extract API

Views

164

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
no replies

Have something to add?

Join the conversation
Resources