Saving PDF as Plain text

Report · Sep 05, 2018

Hello. I am attempting to save all PDFs within a folder as text. To do this i am running an action calling a JavaScript and using this.saveAs

When the script runs attempting to save as Plain text, some PDFs fail as they cannot be tagged. I am therefore curious if a JS can be implemented such that it attempts to save as plain text, and if that fails save as accessible text.

this.saveAs("**filepath**”+ this.documentFileName + "_accessformat.txt","com.adobe.acrobat.accesstext");

this.saveAs("**filepath**”+ this.documentFileName + "_accessformat.txt","com.adobe.acrobat.plain-text");

Report · Sep 05, 2018

You can try it like this:

try {

this.saveAs("**filepath**”+ this.documentFileName + "_accessformat.txt","com.adobe.acrobat.plain-text");

} catch (e) {

this.saveAs("**filepath**”+ this.documentFileName + "_accessformat.txt","com.adobe.acrobat.accesstext");

}

Report · Sep 05, 2018

You'll probably also want to confirm that there is any text at all by using...

this.getPageNumWords(n) // n is the zero-based page number

... on each page.

Some PDF files are image only.

Report · Sep 05, 2018

In that case it will just output an empty file. It shouldn't cause an error...

Report · Sep 05, 2018

I'd call an unnecessary empty file an error. If you saw a 0k text file in the output, wouldn't you go find the input PDF and check it out for yourself?

Report · Sep 06, 2018

Not really. An empty text file is not a corrupt file (like an empty PDF). I

will just assume the PDF has no text.

In fact, it's better than no file at all, because I at least would know

that the file was processed...