Split PDF based on content into different pdfs with custom file name but not all page have the identifier

Report · Aug 06, 2019

Hi would really appreciate advice and expertise on a task I will like to do described below but given that I have very little expertise in Javascript but have Acrobat Pro DC, I am stuck.

I have a report that contains multiple reports in a single pdf which I want to split it into individual reports. The cover page of each report has a report number XPD Report - nnn. I want to be able to search for a specific string ("XPD Report") within the pdf, and then save the sequence of numbers that come after that string for my file name ("001","002"...). I would want to extract the cover page and the succeeding pages of the XPD report and include it into one pdf until it finds another unique XPD Report.

For example,

page 1 XPD Report - 001

page 2

page 3 XPD Report - 002

page 4

page 5 XPD Report - 003

page 6

page 7

Using this example, pages 1 and 2 would be extracted into one pdf together. Page 3 and 4 would be extracted by itself, and pages 5, 6 and 7 would be extracted into one pdf.

I was trying to replicate as much as possible what I could find of the codes in the forum link below and running as an Action in Action Wizard, but to no avail.

https://forums.adobe.com/thread/2502247?q=Split%20PDF%20based%20on%20content%20but%20not

var curDoc = app.activeDocs[0];
var pageArray=[];
var repeat = 0;
var dataCode = "";
var startPage = pageArray[0];
var startPageNumber = 0;
var lastPageNumber = curDoc.numPages;
lastPageNumber--;
// This part gets all the page numbers from the document as before
for (var p = 0; p < curDoc.numPages; p++)
{
for(var n = 0; n< curDoc.getPageNumWords(p); n++)
{
if(curDoc.getPageNthWord(p,n)=="XPD REPORT -")
{
dataCode=curDoc.getPageNthWord(p,n+1) ;
pageArray.push(dataCode);
break;
}
}
}
// This bit has been refactored to stop the need to go through all the pages again
// it also uses the ability of insertPages to insert more than one page at a time.
for ( var i = 1; i < pageArray.length; i++)
{
var endPageNumber = i - 1;
// if we have a match, AND we are not the last page, keep going
if (( startPage === pageArray) && ( i !== lastPageNumber))
{
exportFile = false
}
// if we are the last page, we don't care about a match anymore.
else if ( i === lastPageNumber)
{
// catch if we are at the end of the document
exportFile = true;
endPageNumber = i;
}
// we are not the last page, and we are not a match for the pages we are looking for
else
{
// catch when we have passed the current page
exportFile = true;
}
// once we have some files to process.
if ( exportFile)
{
d = app.newDoc();
// call insert pages once with the page range to insert.
d.insertPages (
{
nPage: d.numPages -1,
cPath: curDoc.path,
nStart: startPageNumber,
nEnd : endPageNumber,
});
// remove initial page
d.deletePages(0);
// set up for the next run
startPage = pageArray;
startPageNumber = i;
}
}

Report · Aug 06, 2019

XPD Report are 2 words in the PDF file. You must test for "XPD" and "Report".

Adobe Community

Split PDF based on content into different pdfs with custom file name but not all page have the identifier