Skip to main content
rombanks
Inspiring
February 24, 2016
Answered

Identifying blank body on a page

  • February 24, 2016
  • 1 reply
  • 2876 views

Hello fellows,

I wonder if it's possible to check if a page contains text in the body area. As far as I can see, Javascript cannot discern between headers/footers/body area. Is this true?

Thank you for your response in advance!

This topic has been closed for replies.
Correct answer Test Screen Name

Hi guys,

Thank you for your response!

What I am trying to do is filtering out those pages that contain the word "Part". The script is not supposed to run the action on these pages.

As you said, testing for the presence of the word "Part" is not a good solution as the action is applied when other words are detected. I guess, the solution is creating an array of all the words that are present on the page and checking if it contains the word "Part". Am i right?

Thanks!


You can do that, but doing an intermediate step like copying to an array is just more overhead in an already slow task.

In pseudocode,

set a flag variable to 0

if there are less than 42 words,

  step through each word

    if the word starts Part (or whatever) set the flag to 1

Now, when the loop is finished, if flag is 1, do your action.

1 reply

try67
Community Expert
Community Expert
February 24, 2016

It is possible, but it's not easy. You can use the getPageNthWordQuads method to get the exact location of each word in the page. Then you need to compare it to the area you're interested in and see if they overlap. If no words match this area then you can conclude that there's no text in it.

rombanks
rombanksAuthor
Inspiring
February 24, 2016

Hi try67,

Thank you for your prompt response! I checked the definition of this method and I don't see how it can be used in this case.

The method params are 0-based indices - how can they help me identify location on a page?

In addition, if you would like to test if a specific word (text string) is present on a page, how would you do that?

Thanks again!

rombanks
rombanksAuthor
Inspiring
February 25, 2016

Your comparison already does that, since it's case-sensitive.


When I remove the [0-9] wildcard, it takes ages for the script to run (on a 2500 page document ). However, it seems to ignore the instruction to search for the "Part" on a page (and if found, skip this page), and still executes the action that comes after the if condition even if the number of words is <42. Any ideas why this happens?

Thank you!