• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Check whether a document is searchable

New Here ,
Oct 04, 2019 Oct 04, 2019

Copy link to clipboard

Copied

Hello, I've been trying to check whether a pdf is searchable, and if it's not, automatically do OCR Action to the document. Until now, I've found two ways of doing it (not efficiently). My question is about the function search.available. I read in the Acrobat JavaScript Scripting Guide that the function can determine if searching is possible. But when I used that to a pdf that is unsearchable, its result was confusing. It said "true" even though it was clearly false. I'm very new to JavaScript, can you guys show me how to use the function in a correct way? Or do you have any idea on how to check whether a document is searchable other than the activation of Read Out Loud function and search for specific word(s)?

 

Thankyou very much! 🙂

TOPICS
How to , Scan documents and OCR , Standards and accessibility

Views

862

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 04, 2019 Oct 04, 2019

Copy link to clipboard

Copied

If you check out the JavaScript API Reference you'll see that search.available doesn't do that at all.

"Returns true if the Search plug-in is loaded and query capabilities are possible. A script author should check this Boolean before performing a query or other search object manipulation."

So it's checking if a plug-in is loaded, not looking at the document. Search, indeed, is different from Find; it's about using indexes that might exist for fast searching. This is far from obvious in the current version.

 

"Searchable" is a word often used, but it has no particular meaning for PDFs. Certainly, PDFs don't have any information in them saying they are "searchable". Here are some possible meanings:

* A file might be called "searchable" if it contains text that extracts to words in your own language.

* A file might be called "searchable" if it contains any text at all, even if it is garbage.

* A file might be called "searchable" if it contains images of text, on which OCR would succeed (since Acrobat Pro may do that if searching).

 

What you can readily do is use doc.getPageNumWords against each page to see if there is any text (the second case); the others are not something you could do in JavaScript.

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 05, 2019 Oct 05, 2019

Copy link to clipboard

Copied

LATEST

T/here is no guarantee that even if there are searchable words on a page that all word images on the page can be searched.  A PDF can consist of word and images and images can have words with them but those words will still be an image.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines