Copy link to clipboard
Copied
I have 231,353 cast vote record PDF files. Short of manually opening each file in Acrobat and exporting it, is there a way to convert these files to text or CSV, either as individual files or as one large file? Or is there an API I can use to access the data from the PDF files? Would prefer using perl.
Thank you.
[Moved from the non-technical Lounge to an Acrobat forum... Mod]
[Here is the list of all Adobe forums... https://forums.adobe.com/welcome]
Looks like
codeproject.com/Articles/7056/Code-to-extract-plain-text-from-a-PDF-file
will get me on my way. I suspect some assembly will be required.
Copy link to clipboard
Copied
That's way too much for Acrobat to be able to handle in a single process.
I don't know about Perl, but I've developed tools in Java that can process large amounts of files in this way.
If you're interested in purchasing such a tool feel free to contact me privately via try6767 at gmail.com .
Copy link to clipboard
Copied
You can try to setup a 'watch' folder. I did this many years ago for a similar need, but not nearly as many documents.
I'm not positive this will still work, but worth a search:
Copy link to clipboard
Copied
Acrobat doesn't support watched folders of any kind. Distiller does, though.
Copy link to clipboard
Copied
The Distiller documentation only mentions converting Postscript to PDF. Since the files I have are PDF, I don't see how Distiller would help...
Copy link to clipboard
Copied
Ah, your right, sorry. I must have been processing PS files. It was a long time ago...and didn't address your question about converting to a different format.
how about these thread?
Copy link to clipboard
Copied
Looks like
codeproject.com/Articles/7056/Code-to-extract-plain-text-from-a-PDF-file
will get me on my way. I suspect some assembly will be required.