We have a business process in which we import very large numbers of pdf files. Most of the files are all text, so the file sizes are small. Some of the files have embedded images, and the file sizes get much larger, quickly. A 30-50MB file is not uncommon.
Is there a utility or tool available that would could incorporate into the import process that would separate a pdf into, for example, two files with one containing the text and the other containing the images? I supposed we could also run it as a post-import process, by pointing it at a specified list of recently arrived files.
The keys are maintaining fidelity to the original text, and not needing human handling of each file.
Is there any tool or utility available that would do this?
Even I would like to know more about this kind of facility. Most of the time my business website involves more than 1000 words of text and images embedded. Other than there is a title on the images as well as the alt text. I need a tool that keeps everything intact and allows me to separate them in a click. My other concern is the blog page. I need my webpage to easily recognize the PDF that I upload and fill in the smart form that we get in Shopify to add the content, images, links, and everything in place.