Utility to separate text from images?
We have a business process in which we import very large numbers of pdf files. Most of the files are all text, so the file sizes are small. Some of the files have embedded images, and the file sizes get much larger, quickly. A 30-50MB file is not uncommon.
Is there a utility or tool available that would could incorporate into the import process that would separate a pdf into, for example, two files with one containing the text and the other containing the images? I supposed we could also run it as a post-import process, by pointing it at a specified list of recently arrived files.
The keys are maintaining fidelity to the original text, and not needing human handling of each file.
Is there any tool or utility available that would do this?
Thanks,
Steve
