Skip to main content
Participant
October 4, 2019
Question

Check whether a document is searchable

  • October 4, 2019
  • 2 replies
  • 2885 views

Hello, I've been trying to check whether a pdf is searchable, and if it's not, automatically do OCR Action to the document. Until now, I've found two ways of doing it (not efficiently). My question is about the function search.available. I read in the Acrobat JavaScript Scripting Guide that the function can determine if searching is possible. But when I used that to a pdf that is unsearchable, its result was confusing. It said "true" even though it was clearly false. I'm very new to JavaScript, can you guys show me how to use the function in a correct way? Or do you have any idea on how to check whether a document is searchable other than the activation of Read Out Loud function and search for specific word(s)?

 

Thankyou very much! 🙂

2 replies

Inspiring
October 5, 2019

T/here is no guarantee that even if there are searchable words on a page that all word images on the page can be searched.  A PDF can consist of word and images and images can have words with them but those words will still be an image.

Participating Frequently
April 10, 2025

Hello.

I am facing the same conundrum now, I know a have a few thousand personal medical documents but only a year ago or so I discovered a feature of my Brother scanner that allows me to include an OCR step for every document I scan.

This means some documents have some extra information on top of the picture and some don't.

As you can imagine I would like to know what are the documents that only contain a picture and which can be searched for OCR-ized content. As a side note I have another app that allows me to re-process any .pdf and perform this OCR step individually and I would like to use it but first I must find what I am facing here with...as one can imagine to open one document at a time and see if I can search for any of the words inside is really not reasonable to try.

 

I'm looking forward to getting ideas or methods to achieve this goal, thank you in advance.

 

I'm not a programmer so unless I'm provided with a finished and tested program I can't create one or use any of the functions discussed above.

Participating Frequently
April 11, 2025

Anybody, anything?

Legend
October 4, 2019

If you check out the JavaScript API Reference you'll see that search.available doesn't do that at all.

"Returns true if the Search plug-in is loaded and query capabilities are possible. A script author should check this Boolean before performing a query or other search object manipulation."

So it's checking if a plug-in is loaded, not looking at the document. Search, indeed, is different from Find; it's about using indexes that might exist for fast searching. This is far from obvious in the current version.

 

"Searchable" is a word often used, but it has no particular meaning for PDFs. Certainly, PDFs don't have any information in them saying they are "searchable". Here are some possible meanings:

* A file might be called "searchable" if it contains text that extracts to words in your own language.

* A file might be called "searchable" if it contains any text at all, even if it is garbage.

* A file might be called "searchable" if it contains images of text, on which OCR would succeed (since Acrobat Pro may do that if searching).

 

What you can readily do is use doc.getPageNumWords against each page to see if there is any text (the second case); the others are not something you could do in JavaScript.