As part of my attempts to batch convert PDFs to text, I have run into a strange error where acrobat XI returns the following:
When I click okay, the text file generates and creates line breaks but is empty.
As a solution to get the text out, I can save it as accessible text; however, for my needs, it is crucial to be able to save it as plain text.
Here is a comparison of a document that does work but when saved as plain text as well as when saved as accessible text.
I want to avoid using accessible text because it introduces CRs and LF, which plain text does not.
I am simply after the text; the fact the figures would, as the message box says would, be converted to a number is fine, but currently, I get nothing out.
I have a script in VBA to convert all of this to batch conversion, but the conversion fails for certain PDFs like the above (attached). `jsObj.SaveAs textPath, "com.adobe.acrobat.plain-text"`
If anyone could potentially think of a workaround or be able to explain why this fails, that would be useful. Acrobat appears to be the only program I have found which generates plain text documents in this way and it's especially useful for my purpose as the sentences aren't broken. Just a real shame it has failed at the final hurdle with some of the PDFs I need to convert in this way.
Hi there not sure I quite understand what you mean. Initially found that if you covert to a doc file without images being converted. Then save this as a PDF and repeat the exercise it works as essentially it doesn't need to perform the tagging.
I have found a reasonable solution (though may not be perfect) the answer is posted here:
The script is in VBA... and uses somewhat dated Acrobat XI and Word however I think proves this can be done (at least reasonably). It works by using word to identify the line breaks as whole sentences. The reason for not directly loading from PDF into word is occasionally word will recognise passages of text as an image. So, I use acrobat to generate word doc from PDF, then use words plain text feature to generate the text file.