PDF to plain text, Some difficult pages were encountered
- September 4, 2022
- 2 replies
- 2033 views
As part of my attempts to batch convert PDFs to text, I have run into a strange error where acrobat XI returns the following:

When I click okay, the text file generates and creates line breaks but is empty.
As a solution to get the text out, I can save it as accessible text; however, for my needs, it is crucial to be able to save it as plain text.
Here is a comparison of a document that does work but when saved as plain text as well as when saved as accessible text.
Plain text:

Accessible text:

I want to avoid using accessible text because it introduces CRs and LF, which plain text does not.
I am simply after the text; the fact the figures would, as the message box says would, be converted to a number is fine, but currently, I get nothing out.
I have a script in VBA to convert all of this to batch conversion, but the conversion fails for certain PDFs like the above (attached). `jsObj.SaveAs textPath, "com.adobe.acrobat.plain-text"`
If anyone could potentially think of a workaround or be able to explain why this fails, that would be useful. Acrobat appears to be the only program I have found which generates plain text documents in this way and it's especially useful for my purpose as the sentences aren't broken. Just a real shame it has failed at the final hurdle with some of the PDFs I need to convert in this way.
