Copy link to clipboard
Copied
I have a huge amount of PDF to convert to MsWord, I am using the action wizard (batch processing) with Save to Word action.
However, on some PDF adobe stop working for different reasons:
1- A pop-up message like: "Save As failed to process this document. No file was created." Where the user should click ok to continue processing the batch
2- It happens rarely that print page and the user should click cancel to continue processing the batch
3- Adobe crash entirely and the user should re-open the application and restart the action.
The following link contains 50 PDFs that bugs and I wonder if you could fix those bugs and make batch processing never stop. In other words, the action wizard should skip all pop-up messages and avoid crashes. In the end, it can report those buggy PDFs that it previously skipped, because it's so annoying when you ran the wizard and came back after 4 hours expecting that it's over but it surprises you with a "click ok to continue"
Copy link to clipboard
Copied
Acrobat is an interactive tool for light duty automation. It is absolutely unsuitable for "huge" volumes of files unless you run maybe 30-50 and restart Acrobat after each one.
Copy link to clipboard
Copied
1- In fact, most of the PDFs in the link cause stops/crashes independently of their position in the batch (or even their size). In other words, a no corrupted PDFs cause stop/crashes for unknown reasons. You can try by your self on the PDFs I provided on this link.
2- As for batch processing, it can handles, in my experiments, up 3600 PDF without problems but for a batch size of 4500 open_file cause a serious slowdown. As a programmer, I find it very weird because the size file list and read/write operations are independent. In other words, you can easily maintain a list of 1M paths and sequentially opening and writing documents without exhausting the IO.
3- Comparing with other available toolkits, Adobe gives the best output in document conversion. Unfortunately, the team decided -for some reasons - to limit the automation capabilities of the software. A piece of advice from a machine learning/Natural Language Processing/Data science perspectives: you can do a lots more if you open on Big Data community. Companies have dumps of millions of PDF and receive a thousand of PDF each day. Unfortunately, they can do nothing with it because open source softwares are terrible ...... Big noisy Data isn't that useful in data science.
An automated - highly parallelized - web service for converting PDF isn't a bad idea.
4- Until now I can't figure out why batch processing stop, the solution is not more complicated than:
error_list = []
for pdf_file in todo_list:
try:
conver_pdf_to_word(pdf_file)
# if an error occurs
except:
error_list.append(pdf_file)
continue
print("Dear user those PDF causes some issues:")
for pdf_file in error_list:
print(pdf_file)
Copy link to clipboard
Copied
The behaviour you describe would be expected in a high volume tool. Acrobat is for people automating a handful of tasks, and you are pushing it beyond tested limits or even comprehension. Many automation tasks fail because of leaks of some kind which don't show up in UI use.
Adobe have highly automated tools. It's a different market. LiveCycle PDF Generator (despite the name), part of Adobe Experience Manager, and the engine in ExportPDF.
Copy link to clipboard
Copied
By the way, that's just my opinion (from 20 years of hearing from people trying to fight Acrobat with unrealistic automation expectations: you NEED a non-GUI tool, even if the best tool happens to have a GUI). If you want to convince Adobe I suggest you use their wishlist (which is also a bug reporter, you can decide which it is): Feature Request/Bug Report Form
Copy link to clipboard
Copied
Do you mean that there is another Adobe Product where I console command like:
>>> adobe_convert.exe input_dir output_dir
will do the job.
If yes could you please provide me with a link to that product... It seems that I was struggling with the wrong product
Copy link to clipboard
Copied
Adobe Experience Manager (formerly LiveCycle) has a Java and watched folder interface among others. Not something we know much about here, and online info seems hard to get: you have to call their specialist sales team. It is aimed at enterprises not individuals.
Copy link to clipboard
Copied
Thanks for the information ... By doing a quick search on google the price of AEM seems to be around $2,000,000 ... This is too much for a PDF to MS Word
Copy link to clipboard
Copied
The price is by negotation. I know that much. By repute some very large enterprises have paid as much as $1m, but a more typically revealed cost is a mere 5-6 figures... Don't know, I like to see a price list myself.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now