Copy link to clipboard
Copied
Is it possible in Acrobat to automatically create/insert bookmarks when a particular string (ex: Order #) is encountered? I am trying to create individual work order files from one large PDF file (using Split Document into multiple files using bookmarks), but I need to create bookmarks each time the string "Order #" is encountered. Because the pages vary based on the work order specs (drawings, material needed, instructions, etc.), this text string is not located in a predictable spot on each page. Once the "Order #" is found, I need to insert a bookmark that includes "Order #" and the next 9 characters that come after it. I know how to do it manually, but there could be 100 or more orders in one file. Any help is greatly appreciated...thanks!
Copy link to clipboard
Copied
Yes, if the text can be identified based on a specific pattern then it should be possible, but will require a custom-made script.
By the way, a script can just split the file directly. There's no need to create bookmarks and then use the Split Document command based on that...
I've developed many similar tools for my clients and would be happy to create one for you as well (for a fee, of course).
You can contact me privately via [try6767 at gmail.com] to discuss it further.
Copy link to clipboard
Copied
See the search and highlight Action here. If it can be used to find your words, then it's a short trip to splitting the PDF.
https://www.acrobatusers.com/actions-exchange/
Copy link to clipboard
Copied
Thank you for getting back to me so quickly. 🙂
I have been able to use the Action you referred me to but instead of having it search for just "Order #", I need it to also highlight the 9 characters afterwards(1 space and 8 numbers that refer to each work order) so that when I split the file into multiple PDFs, each one has "Order #" plus the 8 digit work order number. How do I do this?
Thanks again!
Copy link to clipboard
Copied
That's more complicated. To do that you need to either write a custom JavaScript search, or specify a custom redaction search pattern.
The custom redaction search pattern is easier:
https://blogs.adobe.com/acrolaw/2011/05/creating_and_using_custom_redact/
Copy link to clipboard
Copied
I have tried for over a week to figure out how to either create a javascript (which I am completely new to) or a custom redaction pattern, but I just end up getting more confused. Either I need to:
1) automatically insert a book mark at each text string ("Order # ") , which doesn't seem like it would be a difficult task or
2) use the find, highlight and extract javascript, which I can get to highlight "Order # " plus the 8-digit number that follows, but it will not do the extract portion to individual files, or lastly
3) create a custom redaction pattern, which I have located the xml file, added a new Entry 6, but can't figure out how to make it search for the text string "Order # 12345678" and either insert a bookmark or extract the pages from this point to the next occurrence of "Order #".
Please believe that it is not for lack of trying, but I really need to get this figured out and need to know which method is the easiest to pursue, and the finishing step to achieve it. I guess I really need to get a "Javascript for Dummies" book, since it isn't as easy as VBA to pick up on. Thank you again for any input. 🙂
Copy link to clipboard
Copied
If you are getting the find and highlight to work then you are very close. Did you look in the console to see if there were any errors? Did you try the other Action that extracts commented pages?
If you need help with this (and have a budget) then contact me through www.windjack.com. I can get the Action customized for exactly what you want.
Copy link to clipboard
Copied
So I've had to change my approach somewhat because the only consistent location (mostly, anyway) that has the work order number is the last word on each page. I understand that I need to use the getPageNumWords(p) to get the number of pages and then use getPageNthWord(p, n), where n=getPageNumWords(p)-1. There will, however, be some pages that do not have the WO number on it, so I would like them to default to the WO number on the page before. Using the Extract example I came up with the following (please bear in mind that this is my first attempt at Javascript code):
WoNo = 0
var NumWrds = 0
var finalpage = 0
var count = 0
cPath: this.path
//For each page in document, check whether specific words meet criteria
for (var p = 0; p < this.numPages; p++) {
NumWrds = getPageNumWords(p)
WoNo = getPageNthWord(p, NumWrds - 1)
if (this.getPageNthWord(p, NumWrds - 1) == WoNo) {
count++;
finalpage = p;}
else
{ WoNo = getPageNthWord (p-1,NumWrds - 1);
finalpage = p;}
//Find page position of next break point
for (var p2 = p + 1; p2 < this.numPages; p2++) {
if (this.getPageNthWord(p2, NumWrds) == WoNo) {
this.extractPages({
nStart: p,
nEnd: p2-1,
cPath: WoNo + " " + ".pdf"
});
console.println("Extracted " + WoNo + " " + " pp " + p + " to " + p2)
break
}
}
}
//Save final section after last time run through
this.extractPages({
nStart: finalpage,
nEnd: this.numPages - 1,
cPath: count + " " + WoNo + " " + ".pdf"
});
console.println("Extracted " + WoNo + " " + " pp " + finalpage + " to " + (this.numPages - 1))
The results are going to the first "null" page that doesn't have a WO # on it and then extracts all the WO pages after it (which do have WO #'s on them) for 34 additional pages. The file is 116 pages. What am I doing wrong?! Please help...
Copy link to clipboard
Copied
Following condition is always true:
if (this.getPageNthWord(p, NumWrds - 1) == WoNo) {
What want you compare here?
Copy link to clipboard
Copied
For your first attempt at JS, you've written a lot of advanced code. I'd suggest backing off a bit and doing some testing in the Console window to get a handle on your process.
So as Bernd says, the comparison is meaningless because it compares a value to itself.
What you need to do is test the result for a valid order number format. Use a Regular expression
https://www.pdfscripting.com/public/Pattern-Matching-with-Regular-Expressions.cfm
You also need to verify that the order number is really the last word, and the format it is in when acquired, i.e. punctuation, white space, etc. Use the Console.
Copy link to clipboard
Copied
If I understand correctly, there is no point in creating bookmarks since what you want to do is extract pages based on their content.
In this case you are lucky because it is precisely the subject of this thread which provides several versions of scripts to do this.
Google translate is your friend: https://abracadabrapdf.net/forum/index.php/topic,3410.0.html
Copy link to clipboard
Copied
Is it not possible just to do a nested calculation with getPageNumWords and getPageNthWord to get the last word on the page, for example, getPageNthWord(p, (getPageNumwords(p) - 1)). Then if the result does not resemble 2#######, the value of the previous page is used?
Copy link to clipboard
Copied
You can do what you want to do with a script. No Problem.
But a calculation script is not the correct location for this type of code. This needs to be either a batch or folder level script.
Like I said earlier, you need to did a bit of code testing in the console window.
Run this code on the console, when a page with the order number is displayed.
this.getPageNthWord(this.pageNum, (this.getPageNumwords(this.pageNum) - 1))
What is the exact text that is returned?
When you can verify this, you can then create a regular expression to identify the order number.
And then we can help you to design a complete script to perform this task.
Copy link to clipboard
Copied
If I use this.getPageNthWord(this.pageNum, (this.getPageNumwords(this.pageNum) - 1)) I get exactly what I need, the work order number EXCEPT on a few pages that do have the WO number anywhere on the page (drawings, maps, etc.) In the case where the page does not have a WO number, I would like to use the WO number from the previous page, since these unmarked pages come after the main WO pages. I'm thinking an IF...ELSE statement could handle this, but I'm not sure what the exact code needs to be. Thank you for taking the time to help me.
Copy link to clipboard
Copied
So, the idea is to acquire the last word on the page and then test it with a regular expression to determine whether or not it is a WO number.
var rgWONum = /...../;
var cWONum = null, cLastWord;
for(pg=0;pg<this.numPages;pg++)
{
cLastWord = this.getPageNthWord(pg, this.getPageNumwords(pg) - 1);
if(rgWONum.test(cLastWord))
cWONum= cLastWord;
}
There's the basic script, you'll need to fill out the regular expression. Here's an article on the topic:
https://www.pdfscripting.com/public/Pattern-Matching-with-Regular-Expressions.cfm
Copy link to clipboard
Copied
So if the WO is always 8 digits that start with a 2, it would be /2\ddddddd/?
Copy link to clipboard
Copied
No. It would be:
/^2\d{7}$/
Copy link to clipboard
Copied
The work order in the last word is always preceded by the date (mm/dd/yyyy) the report was created, so would would I need the ^ before the 2 since it's not the beginning of the line. Wow, I was way off on my first guess...thanks for helping me!
Copy link to clipboard
Copied
I have finally created a custom redaction pattern that I can use to find and mark for redaction the work order numbers that I need to be used as the new file names. My question now is how do I extract each group of pages with the same work order into multiple files and name the new files after each work order. The Find, Highlight and Extract action only puts them into one file, and I need them to create a new file per work order number. Thanks for any help you can give me.
Copy link to clipboard
Copied
The next step after running the redaction search is to loop through all the redact annots. Use the annot rectangle to find the text at that location, i.e. the order number. Then find all the pages associated with this number and extract them to a separate file. Repeat until you've run through the annots.