Powershell and Adobe Acrobat
Copy link to clipboard
Copied
Dear all,
I have a lot of PDF files, from which I want to extract information. In particolare I want to extract IDs printed on the PDF document in bold. The PDF documents contain a list of events, the events which are interesting for me are printed in bold in the PDF file. By hand there is no problem to do this, but since I have several documents ( a few hundreds of them ) I do not like to make this by hand.
I own Adobe Acrobat 9 Pro, I heard about the SDK, and I wanted to automate this task, possibly using powershell.
I have heard, that other programming languages this could be achieved.
Where can I have more information on this issue ?
Thank you very much in advance
Erik
Copy link to clipboard
Copied
All Acrobat developers need the SDK. There is no command line interface, but a powerful OLE and JavaScript combination. You could consider extracting each word on every page and checking the location if you know the text is always in the same place. These interfaces would not give you font names. (Bold is NOT a style in PDF). A plug-in in C++ maybe could do this, but developing it is unlikely to save you time compared to a few hundred documents.
But get the SDK, it's free and essential!
Copy link to clipboard
Copied
By the way, I don't think the Acrobat 9 SD is available any more. Adobe only support development on supported versions of Acrobat.
Copy link to clipboard
Copied
It would be enough, if it would be possible to save the files automatically as HTML, they are nicely formatted, exact in a way I need it and the bold information is maintained.
Only thing I would need to do is a script which opens the document I want in pdf, and saves it in HTML, and this for all files in the directory.
I have several hundreds of these pdf files per month ... until now we are doing work by hand and I want to automate the stupid part of my job.
Thank you very much
E.
Copy link to clipboard
Copied
You can do this with an action, I think, with no need for programming or scripting. I may be wrong, that might not be in 9. I think in 9 it was Advanced > Batch Processing.
Copy link to clipboard
Copied
Thank you very much,
this worked fine for me, I tried it out right now.

