Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Searching text phrases without the Search plugin

Explorer ,
Apr 16, 2018 Apr 16, 2018

I am looking for a smooth way of searching text, not just single words in a pdf document using the api. I am not interested in using the Search plugin. I have started to put together a function using the word finder and a vector of words, but I am a little surprised that it does not exist a better way of doing this. Or does it?

TOPICS
Acrobat SDK and JavaScript
1.1K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Apr 16, 2018 Apr 16, 2018

There are no hacks, just better and worse algorithms for searching. The way I do this, and I've done this multiple times, is to create a list (std C template list or MFC CStringList) of words that make up the phrase. Then search for words that match the first item. On a match, divert into a sub loop that checks each item in the list with the next word on the page. Its easy to add simple variations like no-case and partial word checking. I'll typically use the same incrementer in the subloop so t

...
Translate
Community Expert ,
Apr 16, 2018 Apr 16, 2018

How could it be better than having the WordFinder? It's super fast and provides tons of info.  If it wasn't for the WordFinder you'd be parsing content streams. You are on the best route. 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 16, 2018 Apr 16, 2018

OK, thanks. Well, it is easy and fast to search word by word as long as you do not want to search for a sentence or a small amount of text. I have not found a better way than to split the sentence by the spaces between the words and then do a search on each individual word in the correct order to match up for what I am searching for. It kind of feels lika a hack so I was thinking it was a better way of doing it that I was not aware of.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 16, 2018 Apr 16, 2018

There are no hacks, just better and worse algorithms for searching. The way I do this, and I've done this multiple times, is to create a list (std C template list or MFC CStringList) of words that make up the phrase. Then search for words that match the first item. On a match, divert into a sub loop that checks each item in the list with the next word on the page. Its easy to add simple variations like no-case and partial word checking. I'll typically use the same incrementer in the subloop so the phrase matches don't overlap. But you could do it differently depending on the search requirements.  I have also done this by pre-processing the text on the page into distinct blocks of text to ensure the search only happens within a text block. Of course this technique will not catch phases broken across pages, which requires identifying paragraphs, headers and footers.

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 16, 2018 Apr 16, 2018
LATEST

Very good! That means I am on the right track. This sounds like a very similar way that I was approaching this too. I am splitting the search phrase and inserting them into a vector by using Boost and this function:

vector<string> searchWords;
boost::split(searchWords, phrase, boost::is_any_of("\t "));

Now I am using a standard for() loop to run through the words from WordFinder to look for the first word to match and if it does I iterate through the vector to check if the next ones match.

Thanks!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines