Copy link to clipboard
Copied
Hello. I'm a beginner in javascript, and I have adobe acrobat X pro.
I want to be able to search for a specific string within the pdf, and then save the sequence of numbers that come after that string for my file name. Then I would want to check if the following pages have that same exact sequence of numbers, and if there are then I want to extract all the pages with that certain number sequence into one pdf. However, I want to be able to keep looking for new number sequences after I have finished extracting the pages with the first number sequence.
For example,
page 1 NO: 0158K
page 2 NO: 0158K
page 3 NO: 0158K
page 4 NO: 9090V
page 5 NO: 223M
page 6 NO: 223M
Using this example, pages 1, 2, and 3 would be extracted into one pdf together. Page 4 would be extracted by itself, and pages 5 and 6 would be extracted into one pdf.
I kind of have an idea of how to do this, but I'm not quite sure how to implement it, or combine some of the code that I found.
So far I think I have to use an array to put all the pages with the same number sequence in that array, then once all of the pages with that number sequence is located I have to extract it. I was thinking of using something similar to the code from this forum https://forums.adobe.com/message/7931552#7931552 with a few modifications to the code like having an if statements in the nested for loop to look for the number sequence.
So far this is what I have...
var pageArray=[];
for (var p = 0; p < this.numPages; p++) {
for(var n = 0; n<this.getPageNumWords(p); n++){
if(this.getPageNthWord(p,n)=="PPNO"){
dataCode=this.getPageNthWord(p,n+1)
pageArray.push(p);
break;
}
}
for (var p2=p+1; p2 < this.numPages; p2++){
for (var n2=0; n2<this.getPageNumWords(p2); n2++){
if(this.getPageNthWord(p2, n2)=="PPNO"){
if(this.getPageNthWord(p2, n2+1)==dataCode){
repeat++;
break;
}
else{
if (pageArray.length > 0) {
var d = app.newDoc();
for (var x=0; x<pageArray.length; x++){
d.insertPages( {
nPage:d.numPages-1,
cPath: dataCode + ".pdf",
nStart: pageArray
nEnd:pageArray
} );
}
d.deletePages(0);
}
}
}
}
break;
}
}
but after I ran this code, all I got was a new pdf with a blank page.
Hi,
Using the following code I am able to get 2 documents created.
PPNO: 0158K
PPNO: 9090V
are both created as separate files.
...// Using the active document ( i only have one document open, made testing easiser)
var curDoc = app.activeDocs[0];
var pageArray=[];
var repeat = 0;
var dataCode = "";
for (var p = 0; p < curDoc.numPages; p++)
{
for(var n = 0; n< curDoc.getPageNumWords(p); n++)
{
if(curDoc.getPageNthWord(p,n)=="PPNO")
{
dataCode=curDoc.getPageNthWord(p,n+1) ;
Copy link to clipboard
Copied
HI,
I haven't had a chance to properly test your code but looking at it there are a couple of things that don't look right, so I will list them and you can see if you agree and make the changes and then see where we stand.
1. In the document you have the text "NO" and in the code you compare that to "PPNO", guessing that is just a type when you made the forum post, but thought I should mention it.
2. When you go to add the pages you use the following code
for (var x=0; x<pageArray.length; x++){
d.insertPages( {
nPage:d.numPages-1,
cPath: dataCode + ".pdf",
nStart: pageArray
, nEnd:pageArray
, });
}
There are a couple of issues, cPath, is set to dataCode.pdf, but cPath should be the device independent path to the file you want to get the pages from, not the file you are placing the pages into, so this should be the full path to the original file.
and you are passing nStart and nEnd as the same page, this is not necessary, as if you just want one page, just pass nStart and that will be the only page that is included.
Hope this helps
Malcolm
Copy link to clipboard
Copied
Hello. Yeah the "NO" is a typo that I made when I posted the forum.
As for the second part, should that section of code end up being like this then?
for (var x=0; x<pageArray.length; x++){
d.insertPages( {
nPage:d.numPages-1,
cPath: this.path,
nStart: pageArray
} );
}
However, I ran the code with this corrected portion, and I didn't get a new pdf at all.
Copy link to clipboard
Copied
You mean after using
var d = app.newDoc();
you can't see the new document?
Copy link to clipboard
Copied
Yes, I just replaced that portion of the code from my original code. And no there was no new document.
Copy link to clipboard
Copied
Looks like that app.newDoc(); will never used.
Copy link to clipboard
Copied
Hi,
Using the following code I am able to get 2 documents created.
PPNO: 0158K
PPNO: 9090V
are both created as separate files.
// Using the active document ( i only have one document open, made testing easiser)
var curDoc = app.activeDocs[0];
var pageArray=[];
var repeat = 0;
var dataCode = "";
for (var p = 0; p < curDoc.numPages; p++)
{
for(var n = 0; n< curDoc.getPageNumWords(p); n++)
{
if(curDoc.getPageNthWord(p,n)=="PPNO")
{
dataCode=curDoc.getPageNthWord(p,n+1) ;
pageArray.push(p);
break;
}
}
for (var p2=p+1; p2 < curDoc.numPages; p2++)
{
for (var n2=0; n2<curDoc.getPageNumWords(p2); n2++)
{
if(curDoc.getPageNthWord(p2, n2)=="PPNO")
{
// This if is why we only get two files as a result,
// because we can only get to the else if we don't match, but for the last
// number in the document we will never have a page that doesn't match
if(curDoc.getPageNthWord(p2, n2+1)==dataCode)
{
repeat++;
break;
}
else
{
if (pageArray.length > 0)
{
var d = app.newDoc();
for (var x=0; x<pageArray.length; x++)
{
d.insertPages(
{
nPage:d.numPages-1,
// changed to use the curDoc
cPath: curDoc.path,
// as we are importing 1 page at a time.
nStart: pageArray
, });
}
d.deletePages(0);
}
// reset so we get only the new pages.
pageArray = [];
}
}
}
break;
}
}
There are a couple of changes to the code, the main ones where the changes I mentioned, the other is to make sure we reset the page array so that we don't included the pages we found on the first run through of the loop on the second loop.
Hope this helps.
Malcolm
Copy link to clipboard
Copied
Thanks the code helped a lot. The only thing I'm still having a problem with is that the new pdfs don't save, they only show up as a temporary pdf. Hence, the name for the new pdfs have temp at the end. Also, I can't figure out how to customize the name for the new pdfs. Because the main reason why I put cPath: dataCode + ".pdf" in my original post was because I wanted to write the code so that the name of the new pdfs would be the dataCode. Like the name of the new pdfs would be 0158K and 9090V.
Copy link to clipboard
Copied
HI,
You can just call
d.saveAs ( "/path/to/save/location/" + dataCode + ".pdf");
just after the d.deletePages(); line
Hope the helps
Malcolm
Copy link to clipboard
Copied
Okay thank you. And sorry I have one last question. I realized that this code would skip over the last few pages of a pdf if all of the dataCodes matched. I tried to add something at the end of the code...
curDoc.extractPages({
nStart: finalpage,
nEnd: curDoc.numPages - 1,
cPath: dataCode + ".pdf"
});
(I did classify finalpage as a variable, and made it equal to p during the first nested loop.)
in order to account for the last few pages. Something similar to the code from Split large pdf on repeated text pattern, and save new pdf with custom filename . However, I don't think that part of my code is even read because other than the new pdfs that were being made from the additional code, nothing else is being made.
Copy link to clipboard
Copied
HI,
I have refactored the code a little to solve the problem, based on the sample document, comments are in the code so you can see what I have done, as always any question just ask away.
var curDoc = app.activeDocs[0];
var pageArray=[];
var repeat = 0;
var dataCode = "";
var startPage = pageArray[0];
var startPageNumber = 0;
var lastPageNumber = curDoc.numPages;
lastPageNumber--;
// This part gets all the page numbers from the document as before
for (var p = 0; p < curDoc.numPages; p++)
{
for(var n = 0; n< curDoc.getPageNumWords(p); n++)
{
if(curDoc.getPageNthWord(p,n)=="PPNO")
{
dataCode=curDoc.getPageNthWord(p,n+1) ;
pageArray.push(dataCode);
break;
}
}
}
// This bit has been refactored to stop the need to go through all the pages again
// it also uses the ability of insertPages to insert more than one page at a time.
for ( var i = 1; i < pageArray.length; i++)
{
var endPageNumber = i - 1;
// if we have a match, AND we are not the last page, keep going
if (( startPage === pageArray) && ( i !== lastPageNumber))
{
exportFile = false
}
// if we are the last page, we don't care about a match anymore.
else if ( i === lastPageNumber)
{
// catch if we are at the end of the document
exportFile = true;
endPageNumber = i;
}
// we are not the last page, and we are not a match for the pages we are looking for
else
{
// catch when we have passed the current page
exportFile = true;
}
// once we have some files to process.
if ( exportFile)
{
d = app.newDoc();
// call insert pages once with the page range to insert.
d.insertPages (
{
nPage: d.numPages -1,
cPath: curDoc.path,
nStart: startPageNumber,
nEnd : endPageNumber,
});
// remove initial page
d.deletePages(0);
// set up for the next run
startPage = pageArray;
startPageNumber = i;
}
}
Hope this helps
Malcolm