Copy link to clipboard
Copied
Hi everyone,
I am trying to extract pages from a large document based on certain keywords. So if a keyword is found on one specific page, then that page number is pushed to an array, and used to create a new document. However, the issue I am having is with my script, it seems to be very inconsistent and cannot seem to create multiple new documents. Please note - almost all of this script I found online that someone else had made, and I am trying to adapt it to my purposes.
// Iterates over all pages and find a given string and extracts all
// pages on which that string is found to a new file.
var pageArray = [];
var pageA = [];
var stringToSearchFor = "keyword1";
var stringToSearch = "keyword2";
for (var p = 0; p < this.numPages; p++) {
// iterate over all words
for (var n = 0; n < this.getPageNumWords(p); n++) {
if (this.getPageNthWord(p, n) == stringToSearchFor) {
pageArray.push(p);
break;
}
else if (this.getPageNthWord(p,n) == stringToSearch) {
pageA.push(p);
break;
}
}
}
console.println("Test 2 of pageArray " + pageArray);
if (pageArray.length > 0) {
// extract all pages that contain the string into a new document
var d = app.newDoc(); // this will add a blank page - we need to remove that once we are done
for (var n = 0; n < pageArray.length; n++) {
d.insertPages( {
nPage: d.numPages-1,
cPath: this.path,
nStart: pageArray
nEnd: pageArray
} );
console.println(n + " pageArray " + pageArray) }
// remove the first page
d.deletePages(0);
}
if (pageA.length > 0) {
// extract all pages that contain the string into a new document
var q = app.newDoc(); // this will add a blank page - we need to remove that once we are done
for (var n = 0; n < pageA.length; n++) {
q.insertPages( {
nPage: q.numPages-1,
cPath: this.path,
nStart: pageA
nEnd: pageA
} );
console.println(n + " pageA " + pageA)
}
console.println(pageA)
// remove the first page
}
Thanks!
-Forrest
Copy link to clipboard
Copied
Is the issue that some pages that contain both words only appear in one of the final files?
By the way, you're missing the command to delete the first page of the second file, after generating it.
Copy link to clipboard
Copied
Thanks for the quick response - Unfortunately no. I am using this script as part of a way to sort invoices, so the keyword I am searching for is the vendor's name - so two vendor's names will not appear on the same page.
And thanks for pointing that out - I had done that as a trouble shooting mechanism. Oddly enough the script seems to work for certain words but not others, even though I can find both words by searching (cmd + f) the document. Very confusing.
Copy link to clipboard
Copied
I should also point out that I put the console.println() to check that the arrays have values, which both of them do. So I think the issue may have something to do with the newDoc creation?
Copy link to clipboard
Copied
You seem to be describing different kinds of issues. One is with the detection of the keywords, another with the extraction of the pages to the new file (if I understood correctly). These are unrelated issues. You should focus on each one of them separately and try to solve it.
Start by disabling the extraction process. Print to the console the list of pages for each search term. If they are not correct, investigate further. If a page that is supposed to appear in the list doesn't, go back to that page and print out all the words in it, and try to find out what the issue is.
This is how you debug code: You focus on a specific issue and eliminate causes until you find the cause of the problem, and then look for a solution for it. Then you move on to the next issue.
Copy link to clipboard
Copied
I'm seeing a potential bug in your code that might cause all kinds of strange behaviors and that will be very difficult to spot if you don't know to look for it.
You should not use the "this" keyword after you create a new document, as it will probably point to that document instead of to the original one. Instead you should keep a separate reference to the original file, something like this as the first line of your code:
var originalDoc = this;
Then replace all instances of "this" in your code with "originalDoc".
Copy link to clipboard
Copied
Thanks again for the suggestion try67! Unfortunately I am still not getting the script to work - sometimes it will create newDoc for one of the words, but never for both and it does not seem to create either consistently.
// Iterates over all pages and find a given string and extracts all
// pages on which that string is found to a new file.
var pageArray = [];
var pageA = [];
var originalDoc = this;
var stringToSearchFor = "keyword1";
var stringToSearch = "keyword2";
for (var p = 0; p < originalDoc.numPages; p++) {
// iterate over all words
for (var n = 0; n < originalDoc.getPageNumWords(p); n++) {
if (originalDoc.getPageNthWord(p, n) == stringToSearchFor) {
pageArray.push(p);
break;
}
else if (originalDoc.getPageNthWord(p,n) == stringToSearch) {
pageA.push(p);
break;
}
}
}
console.println("Test 2 of pageArray " + pageArray);
console.println("Test 1 of pageA " + pageA);
if (pageArray.length > 0) {
// extract all pages that contain the string into a new document
var d = app.newDoc(); // this will add a blank page - we need to remove that once we are done
for (var n = 0; n < pageArray.length; n++) {
d.insertPages( {
nPage: d.numPages-1,
nStart: pageArray
cPath: originalDoc.path,
nEnd: pageArray
} );
console.println(n + " pageArray " + pageArray) }
// remove the first page
d.deletePages(0);
}
if (pageA.length > 0) {
// extract all pages that contain the string into a new document
var q = app.newDoc(); // this will add a blank page - we need to remove that once we are done
for (var n = 0; n < pageA.length; n++) {
q.insertPages( {
nPage: q.numPages-1,
nStart: pageA
cPath: originalDoc.path,
nEnd: pageA
} );
}
console.println(pageA)
}
Copy link to clipboard
Copied
To help you further I'll need to see the actual file.
On Sep 20, 2016 1:11 AM, "forresth46081687" <forums_noreply@adobe.com>
Find more inspiration, events, and resources on the new Adobe Community
Explore Now