Copy link to clipboard
Copied
Is there a command, coming with Acrobat Reader, to export pdf to text?
I have an application to search files for content, and it would be better if pdf files could be searched. Formating of the output is not important, except that it is best if words are separated by at least one blank.
I have MiKTeX 2.9, where I found commands (bat files) pdf2ps and ps2ascii, with use
pdf2ps pdfinfile tmpPsfile
ps2ascii tmpPsfile txtoutfile
But aternatives are interesting.
Stig Rosenlund
stig.ingvar.rosenlund@gmail.com
stig.rosenlund@sverige.nu
Copy link to clipboard
Copied
No, Reader can't do it, but plenty of other applications can, including ones that can be used from the command-line.
If you're interested I could develop for you (for a fee) a custom-made tool that will export the textual contents of a PDF file (or files) to a text file, or even just search the file for a specific term and then do something with it if a match is found.
You can contact me via [try6767 at gmail.com] to discuss it further.
Copy link to clipboard
Copied
It includes an ifilter which is Microsoft's text extraction infrastructure. No documentation from Adobe because it is an MS standard.
Copy link to clipboard
Copied
Thanks. What are the commands, executable from the command prompt, that use this ifilter? Arguments, results? My search application is part of my programming language Rapp. I need commands available without downloading more special programs. I recommend the users of Rapp to download MiKTeX, so pdf2ps and ps2ascii are available if they have downloaded it. Commands already present in Windows, if Reader is installed, would also work. If those are faster than pdf2ps and ps2ascii I would use them.
Copy link to clipboard
Copied
This is not part of the command line world. Microsoft declared it dead 25 years ago. It's been slow-a-dying, but for the real meat in Windows you don't look for command lines but COM interfaces.
Copy link to clipboard
Copied
OK. It is above my skills to use this COM interface. But I have discovered that ps2ascii works also directly on pdf files, and that has simplified my application.