Split PDF based on content into different pdfs with custom file name but not all page have the identifier

New Here ,
Aug 06, 2019

Copy link to clipboard

Copied

Hi would really appreciate advice and expertise on a task I will like to do described below but given that I have very little expertise in Javascript but have Acrobat Pro DC, I am stuck.

I have a report that contains multiple reports in a single pdf which I want to split it into individual reports. The cover page of each report has a report number XPD Report - nnn. I want to be able to search for a specific string ("XPD Report") within the pdf, and then save the sequence of numbers that come after that string for my file name ("001","002"...). I would want to extract the cover page and the succeeding pages of the XPD report and include it into one pdf until it finds another unique XPD Report.

For example,

page 1    XPD Report - 001

page 2   

page 3    XPD Report - 002

page 4   

page 5    XPD Report - 003

page 6   

page 7

Using this example, pages 1 and 2 would be extracted into one pdf together. Page 3 and 4 would be extracted by itself, and pages 5, 6 and 7 would be extracted into one pdf.

I was trying to replicate as much as possible what I could find of the codes in the forum link below and running as an Action in Action Wizard, but to no avail.

https://forums.adobe.com/thread/2502247?q=Split%20PDF%20based%20on%20content%20but%20not

  1. var curDoc = app.activeDocs[0]; 
  2. var pageArray=[]; 
  3. var repeat = 0
  4. var dataCode = ""
  5. var startPage = pageArray[0]; 
  6. var startPageNumber = 0
  7. var lastPageNumber = curDoc.numPages; 
  8. lastPageNumber--; 
  9.  
  10.  
  11. // This part gets all the page numbers from the document as before 
  12. for (var p = 0; p < curDoc.numPages; p++) 
  13.     for(var n = 0; n< curDoc.getPageNumWords(p); n++) 
  14.     { 
  15.        if(curDoc.getPageNthWord(p,n)=="XPD REPORT -"
  16.        { 
  17.             dataCode=curDoc.getPageNthWord(p,n+1) ; 
  18.             pageArray.push(dataCode); 
  19.             break
  20.        } 
  21.     } 
  22.  
  23. // This bit has been refactored to stop the need to go through all the pages again 
  24. // it also uses the ability of insertPages to insert more than one page at a time. 
  25. for ( var i = 1; i < pageArray.length; i++) 
  26.     var endPageNumber = i - 1
  27.      
  28.     // if we have a match, AND we are not the last page, keep going 
  29.     if (( startPage === pageArray) && ( i !== lastPageNumber)) 
  30.     { 
  31.         exportFile = false 
  32.     } 
  33.     // if we are the last page, we don't care about a match anymore. 
  34.     else if ( i === lastPageNumber) 
  35.     { 
  36.         // catch if we are at the end of the document 
  37.         exportFile = true
  38.         endPageNumber = i; 
  39.     } 
  40.     // we are not the last page, and we are not a match for the pages we are looking for 
  41.     else 
  42.     { 
  43.         // catch when we have passed the current page 
  44.         exportFile = true
  45.  
  46.     } 
  47.     // once we have some files to process. 
  48.     if ( exportFile) 
  49.     { 
  50.         d = app.newDoc(); 
  51.         // call insert pages once with the page range to insert. 
  52.         d.insertPages ( 
  53.         { 
  54.             nPage: d.numPages -1
  55.             cPath: curDoc.path, 
  56.             nStart: startPageNumber, 
  57.             nEnd : endPageNumber, 
  58.         }); 
  59.         // remove initial page 
  60.         d.deletePages(0); 
  61.         // set up for the next run 
  62.         startPage = pageArray
  63.         startPageNumber = i; 
  64.     } 
TOPICS
Acrobat SDK and JavaScript

Views

78

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Split PDF based on content into different pdfs with custom file name but not all page have the identifier

New Here ,
Aug 06, 2019

Copy link to clipboard

Copied

Hi would really appreciate advice and expertise on a task I will like to do described below but given that I have very little expertise in Javascript but have Acrobat Pro DC, I am stuck.

I have a report that contains multiple reports in a single pdf which I want to split it into individual reports. The cover page of each report has a report number XPD Report - nnn. I want to be able to search for a specific string ("XPD Report") within the pdf, and then save the sequence of numbers that come after that string for my file name ("001","002"...). I would want to extract the cover page and the succeeding pages of the XPD report and include it into one pdf until it finds another unique XPD Report.

For example,

page 1    XPD Report - 001

page 2   

page 3    XPD Report - 002

page 4   

page 5    XPD Report - 003

page 6   

page 7

Using this example, pages 1 and 2 would be extracted into one pdf together. Page 3 and 4 would be extracted by itself, and pages 5, 6 and 7 would be extracted into one pdf.

I was trying to replicate as much as possible what I could find of the codes in the forum link below and running as an Action in Action Wizard, but to no avail.

https://forums.adobe.com/thread/2502247?q=Split%20PDF%20based%20on%20content%20but%20not

  1. var curDoc = app.activeDocs[0]; 
  2. var pageArray=[]; 
  3. var repeat = 0
  4. var dataCode = ""
  5. var startPage = pageArray[0]; 
  6. var startPageNumber = 0
  7. var lastPageNumber = curDoc.numPages; 
  8. lastPageNumber--; 
  9.  
  10.  
  11. // This part gets all the page numbers from the document as before 
  12. for (var p = 0; p < curDoc.numPages; p++) 
  13.     for(var n = 0; n< curDoc.getPageNumWords(p); n++) 
  14.     { 
  15.        if(curDoc.getPageNthWord(p,n)=="XPD REPORT -"
  16.        { 
  17.             dataCode=curDoc.getPageNthWord(p,n+1) ; 
  18.             pageArray.push(dataCode); 
  19.             break
  20.        } 
  21.     } 
  22.  
  23. // This bit has been refactored to stop the need to go through all the pages again 
  24. // it also uses the ability of insertPages to insert more than one page at a time. 
  25. for ( var i = 1; i < pageArray.length; i++) 
  26.     var endPageNumber = i - 1
  27.      
  28.     // if we have a match, AND we are not the last page, keep going 
  29.     if (( startPage === pageArray) && ( i !== lastPageNumber)) 
  30.     { 
  31.         exportFile = false 
  32.     } 
  33.     // if we are the last page, we don't care about a match anymore. 
  34.     else if ( i === lastPageNumber) 
  35.     { 
  36.         // catch if we are at the end of the document 
  37.         exportFile = true
  38.         endPageNumber = i; 
  39.     } 
  40.     // we are not the last page, and we are not a match for the pages we are looking for 
  41.     else 
  42.     { 
  43.         // catch when we have passed the current page 
  44.         exportFile = true
  45.  
  46.     } 
  47.     // once we have some files to process. 
  48.     if ( exportFile) 
  49.     { 
  50.         d = app.newDoc(); 
  51.         // call insert pages once with the page range to insert. 
  52.         d.insertPages ( 
  53.         { 
  54.             nPage: d.numPages -1
  55.             cPath: curDoc.path, 
  56.             nStart: startPageNumber, 
  57.             nEnd : endPageNumber, 
  58.         }); 
  59.         // remove initial page 
  60.         d.deletePages(0); 
  61.         // set up for the next run 
  62.         startPage = pageArray
  63.         startPageNumber = i; 
  64.     } 
TOPICS
Acrobat SDK and JavaScript

Views

79

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Aug 06, 2019 0
Adobe Community Professional ,
Aug 06, 2019

Copy link to clipboard

Copied

XPD Report are 2 words in the PDF file. You must test for "XPD" and "Report".

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 06, 2019 0