Skip to main content
Participating Frequently
May 8, 2023
Question

Getting a list of page numbers of search text

  • May 8, 2023
  • 2 replies
  • 2540 views

Hi,

 

New to this forum and to JavaScript.  I found a possible answer-adjacent thread under "Get Page number of the search text".  I honestly don’t know what I'm looking at when I'm looking at the JavaScript.  And maybe there's a native tool in Acrobat that I don’t know about.

I'm using  Adobe Acrobat Pro - Continuous Release, which I guess means I'm up to date.

 

I've attached a sample PDF as I cannot attach the real deal because this is a work thing, and huge.  

 

I have a 600+ page PDF that's basically a mail-merge document - you know, 600 or so pages of the same form repeated for hundreds of recipients.  Generally, the form is 1 page per, but sometimes 2 or 3 pages.  So, it's not a matter of "page 1 is this person, page 2 is that person..." and so on.  But since there's repeatable, identifiable information that indicates a new form, I figure I can use that as a locator to identify where all the recipients' pages start - and then get a result like an index.  

 

Please keep in mind this is NOT A PDF FORM form, it just looks like that - generated by our software and output into a PDF.  So, when I use the word FORM, I do not mean a PDF form.  Thanks 🙂

 

Take the example PDF.  The output would look like 

Mike Michaelson - page 1

Rich Richardson - page 3

Tom Tomlinson - page 5

Vinnie The Stitch - page 6

 

Now, I do a bit of programming, not JavaScript.  So, I can kind of follow along with the logic of it - but the syntax, forget about it.  I can think of two ways to do this

 

  1.  Compare the 600-page PDF against the 600 or so names, finding a per-name result and associated page numbers.  That's going to be the most accurate but hardest to set up and a super time-consuming process.
  2. My PDF has the benefit of already being sorted  alphabetically name (kind of like my example PDF).  This means I don’t have to search by name; I can instead search by a key phrase - something that I know is going to be on the first page of every form, such as the phrase "Persons Name".  Every time I find "Persons Name" I know that's a new form. 

 

Now, #2 sounds like a far easier implementation and a far faster process.  The output would instead be:

 

1

3

5

6

 

And since the PDF is already alphabetized, this works!  I can copy/paste that output and line it up against my list of alphabetized names on my word doc or spreadsheet 1:1.  Yeah, it's a bit of manual work in the end, but I'm not looking for a scalpel here - just a hammer for now.

 

Thanks.

 

This topic has been closed for replies.

2 replies

AKEDMAuthor
Participating Frequently
May 9, 2023

What I'm asking seems like a pretty basic thing.  It's a shame that Acrobat doesn't allow that basic function. 

 

TRY67, to your answer - thank you.  And your first steps got me in the right direction - I added to it.

 

Here's how to get a file (ie, a list) of the page numbers where repeated text is found in a PDF:

Feel free to use my example Test File.PDF in this thread. 

Note, this is for the current version of Acrobat Pro on May 9 2023.

 

  1.  Load your PDF into Acrobat
  2.  Make sure your text is searchable.  That is, make sure it's actual text (searchable) and not a picture of text (not searchable).  
  3.  Select the REDACT tool.  (Note, nothing will actually be redacted/removed; hence, the oddity of  choosing REDACT for what is basically a "find all" function, but it's the only tool I know that does what I need.  Thanks to TRY67 for introducing me to it.)
  4.  Select REDACT TEXT & IMAGES
  5.  Select FIND TEXT & REDACT
  6.  You may get a warning pop-up that basically restates step 2
  7.  The side-window that appears is titled Search.  It is the "find all" tool.  Type what you want to find into the WHAT WORD OR PHRASE WOULD YOU LIKE TO SEARCH FOR? field.
  8.  Select SEARCH & REMOVE TEXT.  (Again, the text will not be removed.)
  9.  Above the list of results will be a disk icon.  Click on it and save as CSV (spreadsheet friendly).
  10.  Open that file into whatever spreadsheet you want, and there's the list of page numbers along with additional information.
  11.  At this point you can X out of the Search window to cease the redact/removal process.  Essentially, you've used the REDACT tool as a find-all instead.

 

If you're using my Test File.PDF, search for Persons Name, you should get 1 3 5 6 in the resultant CSV file.  

 

 

try67
Community Expert
Community Expert
May 8, 2023

You can't get the results of an Advanced Search command directly into a script, if that's what you mean.

You would to scan the file page by page, word by word, which can be tricky if you have such long files.

My recommendation would be to use the Search & Redact tool to look for the "Persons Name" strings.

It will add a Redact annotation over all the instances of that text, and will do it much quicker than a script can.

You then look for these annotations, and where you find them you know that text is located. You can then delete them and move forward with the rest of your code.

It will mean adding another step to your workflow (as this command can't be scripted, either), but it will save you a lot of headaches and frustrations in the long run.