Highlighted

Facing trouble in Adobe Acrobat pdf to excel conversion using python

Community Beginner ,
Jun 16, 2020

Copy link to clipboard

Copied

I have developed a script for automatically converting pdf file to excel using Adobe Acrobat Pro tool. The tool has AvDoc SDK where we convert the file and save as excel. I am converting bulk files say 100 files together in loop. Below is code snippet I am using:

import win32com.client, win32com.client.makepy, os, winerror, re
from win32com.client.dynamic import ERRORS_BAD_CONTEXT

win32com.client.makepy.GenerateFromTypeLibSpec('Acrobat')
adobe = win32com.client.DispatchEx('AcroExch.App')
avDoc = win32com.client.DispatchEx('AcroExch.AVDoc')
#assume files is list of string(.pdf files)
for file in files:
    excel_file = file.replace(".pdf", ".xlsx")
    ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)
    src=os.path.abspath(file)
    try:
        avDoc.Open(src, src)
        pdDoc = avDoc.GetPDDoc()
        jObject = pdDoc.GetJSObject()
        jObject.SaveAs(excel_file, "com.adobe.acrobat.xlsx")
    except Exception as e:
        LOGGER.exception("exception occured in reading and converting pdf "+str(e))
    finally:
        pdDoc.Close()
        avDoc.Close(True)

Now I have read the adobe sdk documentation provided here https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/iac_api_reference.pdf which could not help me in my case where process control is transferred to the application and due to some issue the application needed a manual intervention.
For example:
Incase if I have an incorrect source file path - adobe will display an error dialog box saying "Could not open the file". In this case my process control is stuck at adobe application and adobe is waiting for button click event on that dialog box (i.e click close to continue). Now until I manually click the button it will not send back the control. I could not find in the provided sdk document how to programatically solve this issue.

If you can see below image, the dialog box is waiting for ok button to be pressed, until I press the button manually, my caller process(python) is halted.adobe-issue.png

 

I need some solution where I can kill the adobe process or use some timeout method to come back in python process if adobe process got suspended because of any reason.

Thanks in advance.

Topics

Crash or freeze, Edit and convert PDFs

Views

172

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Facing trouble in Adobe Acrobat pdf to excel conversion using python

Community Beginner ,
Jun 16, 2020

Copy link to clipboard

Copied

I have developed a script for automatically converting pdf file to excel using Adobe Acrobat Pro tool. The tool has AvDoc SDK where we convert the file and save as excel. I am converting bulk files say 100 files together in loop. Below is code snippet I am using:

import win32com.client, win32com.client.makepy, os, winerror, re
from win32com.client.dynamic import ERRORS_BAD_CONTEXT

win32com.client.makepy.GenerateFromTypeLibSpec('Acrobat')
adobe = win32com.client.DispatchEx('AcroExch.App')
avDoc = win32com.client.DispatchEx('AcroExch.AVDoc')
#assume files is list of string(.pdf files)
for file in files:
    excel_file = file.replace(".pdf", ".xlsx")
    ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)
    src=os.path.abspath(file)
    try:
        avDoc.Open(src, src)
        pdDoc = avDoc.GetPDDoc()
        jObject = pdDoc.GetJSObject()
        jObject.SaveAs(excel_file, "com.adobe.acrobat.xlsx")
    except Exception as e:
        LOGGER.exception("exception occured in reading and converting pdf "+str(e))
    finally:
        pdDoc.Close()
        avDoc.Close(True)

Now I have read the adobe sdk documentation provided here https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/iac_api_reference.pdf which could not help me in my case where process control is transferred to the application and due to some issue the application needed a manual intervention.
For example:
Incase if I have an incorrect source file path - adobe will display an error dialog box saying "Could not open the file". In this case my process control is stuck at adobe application and adobe is waiting for button click event on that dialog box (i.e click close to continue). Now until I manually click the button it will not send back the control. I could not find in the provided sdk document how to programatically solve this issue.

If you can see below image, the dialog box is waiting for ok button to be pressed, until I press the button manually, my caller process(python) is halted.adobe-issue.png

 

I need some solution where I can kill the adobe process or use some timeout method to come back in python process if adobe process got suspended because of any reason.

Thanks in advance.

Topics

Crash or freeze, Edit and convert PDFs

Views

173

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Jun 16, 2020 1
Most Valuable Participant ,
Jun 17, 2020

Copy link to clipboard

Copied

Acrobat is an interactive tool with VERY limited automation. So when you run your batch you need to stay present, ready to deal with any messages. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jun 17, 2020 0
Community Beginner ,
Jun 18, 2020

Copy link to clipboard

Copied

OK, so can you suggest any other adobe tool which can help me in my case - converting pdf to excel programatically with complete automation (no manual intervention).

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jun 18, 2020 1
Most Valuable Participant ,
Jun 18, 2020

Copy link to clipboard

Copied

What sort of volumes? Is it for server use?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jun 18, 2020 1
Community Beginner ,
Jun 18, 2020

Copy link to clipboard

Copied

Yes, we have more than 30k files per day to process in our server.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jun 18, 2020 1
Most Valuable Participant ,
Jun 18, 2020

Copy link to clipboard

Copied

Ok, I can tell you that Acrobat is not just unsuitable technically, that's also a breach of the license. For all server development you need to find software with server licenses (or you run the risk of being billed for each server user, or worse...)

 

Adobe's server product is Adobe Experience Manager (AEM), formerly known as LiveCycle. It has many components, and many target markets. The component PDF Generator is a misleading name, I believe it also includes conversion from PDF to other formats. https://www.adobe.io/apis/experiencecloud/aem.html

 

Price is on application, you have to talk to a salesman. We don't know much about it here, unfortunately.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jun 18, 2020 1