Check whether a document is searchable

Report · Oct 04, 2019

Hello, I've been trying to check whether a pdf is searchable, and if it's not, automatically do OCR Action to the document. Until now, I've found two ways of doing it (not efficiently). My question is about the function search.available. I read in the Acrobat JavaScript Scripting Guide that the function can determine if searching is possible. But when I used that to a pdf that is unsearchable, its result was confusing. It said "true" even though it was clearly false. I'm very new to JavaScript, can you guys show me how to use the function in a correct way? Or do you have any idea on how to check whether a document is searchable other than the activation of Read Out Loud function and search for specific word(s)?

Thankyou very much! 🙂

Report · Oct 04, 2019

If you check out the JavaScript API Reference you'll see that search.available doesn't do that at all.

"Returns true if the Search plug-in is loaded and query capabilities are possible. A script author should check this Boolean before performing a query or other search object manipulation."

So it's checking if a plug-in is loaded, not looking at the document. Search, indeed, is different from Find; it's about using indexes that might exist for fast searching. This is far from obvious in the current version.

"Searchable" is a word often used, but it has no particular meaning for PDFs. Certainly, PDFs don't have any information in them saying they are "searchable". Here are some possible meanings:

* A file might be called "searchable" if it contains text that extracts to words in your own language.

* A file might be called "searchable" if it contains any text at all, even if it is garbage.

* A file might be called "searchable" if it contains images of text, on which OCR would succeed (since Acrobat Pro may do that if searching).

What you can readily do is use doc.getPageNumWords against each page to see if there is any text (the second case); the others are not something you could do in JavaScript.

Report · Oct 05, 2019

T/here is no guarantee that even if there are searchable words on a page that all word images on the page can be searched. A PDF can consist of word and images and images can have words with them but those words will still be an image.