Find Specific Text String and Automatically Create Bookmarks Based on That String

New Here ,
Apr 03, 2020 Apr 03, 2020

Copy link to clipboard

Copied

Is it possible in Acrobat to automatically create/insert bookmarks when a particular string (ex: Order #) is encountered?  I am trying to create individual work order files from one large PDF file (using Split Document into multiple files using bookmarks), but I need to create bookmarks each time the string "Order #" is encountered.  Because the pages vary based on the work order specs (drawings, material needed, instructions, etc.), this text string is not located in a predictable spot on each page.  Once the "Order #" is found, I need to insert a bookmark that includes "Order #" and the next 9 characters that come after it. I know how to do it manually, but there could be 100 or more orders in one file.  Any help is greatly appreciated...thanks!

TOPICS
Acrobat SDK and JavaScript, How to

Views

487

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Apr 03, 2020 Apr 03, 2020

Copy link to clipboard

Copied

Yes, if the text can be identified based on a specific pattern then it should be possible, but will require a custom-made script.

By the way, a script can just split the file directly. There's no need to create bookmarks and then use the Split Document command based on that...

 

I've developed many similar tools for my clients and would be happy to create one for you as well (for a fee, of course).

You can contact me privately via [try6767 at gmail.com] to discuss it further.

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Apr 03, 2020 Apr 03, 2020

Copy link to clipboard

Copied

See the search and highlight Action here. If it can be used to find your words, then it's a short trip to splitting the PDF. 

https://www.acrobatusers.com/actions-exchange/

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 13, 2020 Apr 13, 2020

Copy link to clipboard

Copied

Thank you for getting back to me so quickly. 🙂 

 

I have been able to use the Action you referred me to but instead of having it search for just "Order #", I need it to also highlight the 9 characters afterwards(1 space and 8 numbers that refer to each work order) so that when I split the file into multiple PDFs, each one has "Order #" plus the 8 digit work order number.  How do I do this?

 

Thanks again!

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Apr 13, 2020 Apr 13, 2020

Copy link to clipboard

Copied

That's more complicated. To do that you need to either write a custom JavaScript search, or specify a custom redaction search pattern.

The custom redaction search pattern is easier:

https://blogs.adobe.com/acrolaw/2011/05/creating_and_using_custom_redact/

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 21, 2020 Apr 21, 2020

Copy link to clipboard

Copied

I have tried for over a week to figure out how to either create a javascript (which I am completely new to) or a custom redaction pattern, but I just end up getting more confused.  Either I need to:

1) automatically insert a book mark at each text string ("Order # ") , which doesn't seem like it would be a difficult task or

2) use the find, highlight and extract javascript, which I can get to highlight "Order # " plus the 8-digit number that follows, but it will not do the extract portion to individual files, or lastly

3) create a custom redaction pattern, which I have located the xml file, added a new Entry 6, but can't figure out how to make it search for the text string "Order # 12345678" and either insert a bookmark or extract the pages from this point to the next occurrence of "Order #".

Please believe that it is not for lack of trying, but I really need to get this figured out and need to know which method is the easiest to pursue, and the finishing step to achieve it.  I guess I really need to get a "Javascript for Dummies" book, since it isn't as easy as VBA to pick up on.  Thank you again for any input. 🙂

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Apr 22, 2020 Apr 22, 2020

Copy link to clipboard

Copied

If you are getting the find and highlight to work then you are very close. Did you look in the console to see if there were any errors? Did you try the other Action that extracts commented pages? 

 

If you need help with this (and have a budget) then contact me through www.windjack.com. I can get the Action customized for exactly what you want. 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 04, 2020 May 04, 2020

Copy link to clipboard

Copied

So I've had to change my approach somewhat because the only consistent location (mostly, anyway) that has the work order number is the last word on each page.  I understand that I need to use the getPageNumWords(p) to get the number of pages and then use getPageNthWord(p, n), where n=getPageNumWords(p)-1.  There will, however, be some pages that do not have the WO number on it, so I would like them to default to the WO number on the page before.  Using the Extract example I came up with the following (please bear in mind that this is my first attempt at Javascript code):

WoNo = 0

var NumWrds = 0

var finalpage = 0

var count = 0

cPath: this.path

//For each page in document, check whether specific words meet criteria

for (var p = 0; p < this.numPages; p++) {

  NumWrds = getPageNumWords(p)

  WoNo = getPageNthWord(p, NumWrds - 1)

  if (this.getPageNthWord(p, NumWrds - 1) == WoNo) {

      count++;

      finalpage = p;}

  else

    { WoNo = getPageNthWord (p-1,NumWrds - 1);

      finalpage = p;}

      //Find page position of next break point

      for (var p2 = p + 1; p2 < this.numPages; p2++) {

        if (this.getPageNthWord(p2, NumWrds) == WoNo) {

            this.extractPages({

                  nStart: p,

              nEnd: p2-1,

              cPath: WoNo + " " + ".pdf"

            });

            console.println("Extracted " + WoNo + " " + " pp " + p + " to " + p2)

            break

          }

        }

}

//Save final section after last time run through

this.extractPages({

  nStart: finalpage,

  nEnd: this.numPages - 1,

  cPath: count + " " + WoNo + " " + ".pdf"

});

console.println("Extracted " + WoNo + " " + " pp " + finalpage + " to " + (this.numPages - 1))

 

The results are going to the first "null" page that doesn't have a WO # on it and then extracts all the WO pages after it (which do have WO #'s on them) for 34 additional pages.  The file is 116 pages.  What am I doing wrong?!  Please help...

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
May 04, 2020 May 04, 2020

Copy link to clipboard

Copied

Following condition is always true:

if (this.getPageNthWord(p, NumWrds - 1) == WoNo) {

 

What want you compare here?

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
May 05, 2020 May 05, 2020

Copy link to clipboard

Copied

For your first attempt at JS, you've written a lot of advanced code. I'd suggest backing off a bit and doing some testing in the Console window to get a handle on your process. 

So as Bernd says, the comparison is meaningless because it compares a value to itself. 

What you need to do is test the result for a valid order number format. Use a Regular expression

https://www.pdfscripting.com/public/Pattern-Matching-with-Regular-Expressions.cfm

 

You also need to verify that the order number is really the last word, and the format it is in when acquired, i.e. punctuation, white space, etc. Use the Console. 

 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Apr 04, 2020 Apr 04, 2020

Copy link to clipboard

Copied

If I understand correctly, there is no point in creating bookmarks since what you want to do is extract pages based on their content.
In this case you are lucky because it is precisely the subject of this thread which provides several versions of scripts to do this.

 

Google translate is your friend: https://abracadabrapdf.net/forum/index.php/topic,3410.0.html

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 05, 2020 May 05, 2020

Copy link to clipboard

Copied

Is it not possible just to do a nested calculation with getPageNumWords and getPageNthWord to get the last word on the page, for example, getPageNthWord(p, (getPageNumwords(p) - 1)).  Then if the result does not resemble 2#######, the value of the previous page is used?

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
May 05, 2020 May 05, 2020

Copy link to clipboard

Copied

You can do what you want to do with a script. No Problem.

But a calculation script is not the correct location for this type of code. This needs to be either a batch or folder level script. 

 

Like I said earlier, you need to did a bit of code testing in the console window. 

Run this code on the console, when a page with the order number is displayed.

 

this.getPageNthWord(this.pageNum, (this.getPageNumwords(this.pageNum) - 1))

 

What is the exact text that is returned?

When you can verify this, you can then create a regular expression to identify the order number. 

And then we can help you to design a complete script to perform this task.

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 07, 2020 May 07, 2020

Copy link to clipboard

Copied

If I use this.getPageNthWord(this.pageNum, (this.getPageNumwords(this.pageNum) - 1)) I get exactly what I need, the work order number EXCEPT on a few pages that do have the WO number anywhere on the page (drawings, maps, etc.)  In the case where the page does not have a WO number, I would like to use the WO number from the previous page, since these unmarked pages come after the main WO pages.  I'm thinking an IF...ELSE statement could handle this, but I'm not sure what the exact code needs to be.  Thank you for taking the time to help me.

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
May 07, 2020 May 07, 2020

Copy link to clipboard

Copied

So, the idea is to acquire the last word on the page and then test it with a regular expression to determine whether or not it is a WO number. 

 

var rgWONum = /...../; 

var cWONum = null, cLastWord;

for(pg=0;pg<this.numPages;pg++)

{

    cLastWord = this.getPageNthWord(pg, this.getPageNumwords(pg) - 1);

    if(rgWONum.test(cLastWord))

        cWONum= cLastWord;

}

 

There's the basic script, you'll need to fill out the regular expression. Here's an article on the topic:

 https://www.pdfscripting.com/public/Pattern-Matching-with-Regular-Expressions.cfm

 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 07, 2020 May 07, 2020

Copy link to clipboard

Copied

So if the WO is always 8 digits that start with a 2, it would be /2\ddddddd/?

 

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
May 07, 2020 May 07, 2020

Copy link to clipboard

Copied

No. It would be:

/^2\d{7}$/

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 08, 2020 May 08, 2020

Copy link to clipboard

Copied

The work order in the last word is always preceded by the date (mm/dd/yyyy) the report was created, so would would I need the ^ before the 2 since it's not the beginning of the line.  Wow, I was way off on my first guess...thanks for helping me!

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 05, 2020 Aug 05, 2020

Copy link to clipboard

Copied

I have finally created a custom redaction pattern that I can use to find and mark for redaction the work order numbers that I need to be used as the new file names.  My question now is how do I extract each group of pages with the same work order into multiple files and name the new files after each work order.  The Find, Highlight and Extract action only puts them into one file, and I need them to create a new file per work order number.  Thanks for any help you can give me.

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Aug 05, 2020 Aug 05, 2020

Copy link to clipboard

Copied

The next step after running the redaction search is to loop through all the redact annots. Use the annot rectangle to find the text at that location, i.e. the order number. Then find all the pages associated with this number and extract them to a separate file.  Repeat until you've run through the annots.

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines