Delete Pages By Text Search

Report · Mar 15, 2023

Hello everyone, everything good? I have a document that I need to separate from different classes, for example, it is a document with 200 pages mixed up. I have the list of names for the morning class and the list of names for the night class, so I wanted to put the list of names and delete it to be able to separate the list quickly

Example:
- Alice Says
-Francisco Campos
- etc...
- etc...

Those above I wanted the PDF to automatically find and delete, how would that be? I believe that only with correct script?

Report · May 08, 2024

That did it. Thank you!

View solution in original post

Report · Mar 15, 2023

You are correct that is can only be done with a script. However, it's not particularly easy.

I would suggest doing this.

Create a new 'Action' with the Action Wizard,
Add "Search and Remove Text", from the "Protection" category. This is where you enter your list of names. This step will mark the entered names with a redact annotations. It won't remove the text.
Add "Execute JavaScript", from the "More Tools" category. Then enter the following script to find the page numbers for all the redact annotations and then remove them.

(getAnnots() || []).filter(a=>a.type = "Redact").map(a=>a.page).sort().filter((ele,i,arr)=> arr.indexOf(ele) == i).reverse().forEach(a=>this.deletePages(a))

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 15, 2023

Hello Thom, how are you? Thank you immensely for the answer, your method has really helped me a lot, the question of the first step of the script which is the search and remove text is working 100%, the part of the Script has some pages that are marked that it is not deleting for some reason

In case you wanted to test with the file I'm using and see if that's right I'll send you the file and the list of names for you to check if your script will delete 100%

But even so, he already gave me an 80% help to facilitate this part, thanks again!

Report · Mar 15, 2023

Please look in the console window (Ctrl-J) and see if there are any errors reported there.

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 15, 2023

Arg!! I missed a piece. The sort function is preforming a lexical sort on the page numbers, rather than a numerical sort.

Here's the corrected code. I also included explicit refererences to the doc object on which the script is operating

(event.target.getAnnots() || []).filter(a=>a.type = "Redact").map(a=>a.page).sort((a,b)=>a-b).filter((ele,i,arr)=> arr.indexOf(ele) == i).reverse().forEach(a=>event.target.deletePages(a))

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 16, 2023

Does this code take into account the situation where there are multiple results on one page? It needs to be filtered to only unique values.

Report · Mar 16, 2023

Yes, it filters out multiple hits on a page. And I tested it and it works.

The error you are seeing (invalid input to deletePages) is from pages not being properly sorted, so the pages are deleted out of order, resulting in a later page being hit after earlier pages are removed.

This error is completely unrelated to using the equality comparison. Changing it to the identity comparison doesn't do anything. You saw this error becuase the code was not updated, not because there was a problem with the code. You can test this by changing it back and restarting Acrobat.

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 16, 2023

Hi Thom, how are you? I tested this code but it still gives error

Report · Mar 16, 2023

Thom, I believe I managed to correct the code

Here is the correct code:

(event.target.getAnnots() || []).filter(a => a.type === "Redact")
.map(a => a.page)
.sort((a, b) => a - b)
.filter((ele, i, arr) => arr.indexOf(ele) === i)
.reverse()
.forEach(a => event.target.deletePages(a));

Report · May 07, 2024

I want to do a similar thing, but instead of deleting the pages I want to extract them as one separate file. I changed the last line of code from "deletePages" to "extractPages." However, it creates a separate file for each page that's extracted and I just need one file for all of them. BTW I do not know ANYTHING about Java script. -Thanks!

Report · May 07, 2024

An easier way to do that is delete the pages which don't have the search term, and then you'll end up with a document that has all the pages that has them. Then you can save it under a new name, or print it, etc.

Report · May 07, 2024

So how do I change the code to delete the pages that don't have the search term?

Report · May 07, 2024

I like to use a different kind of code style, so I re-wrote it entirely:

this.syncAnnotScan();
for (var p=this.numPages-1; p>=0; p--) {
	var matchFound = false;
	var annots = this.getAnnots({nPage: p});
	if (annots!=null) {
		for (var i=annots.length-1; i>=0; i--) {
			var annot = annots[i];
			if (annot.type=="Redact") {
				matchFound = true;
				annot.destroy();			
			}
		}
	}
	if (!matchFound) {
		if (this.numPages==1) app.alert("Error! No matches were found in the entire file.");
		else this.deletePages(p, p);
	}
}

The code above will also remove the Redaction comments, so they don't appear in the final outcome.

Edited: Code fixed

Report · May 07, 2024

That's great, it worked! Thanks!

Report · May 08, 2024

After some other work I went back and tried the script on my file and it didn’t work right. It deleted a few pages, but others were still there even though they did not have the criteria I redacted. This is about a 2000 page file with about 500 pages with criteria I searched for. I tried it on a sample file yesterday thinkin it worked with just about 150 pages the redacted parts were just on the first few page.

Report · May 08, 2024

Are the results of the Search & Remove Text command correct, though, before running the script?

Report · May 08, 2024

Yes, after the Search & Remove Text command ran it redacted the correct pages and should have deleted about 1500 pages, but it just deleted 18 pages. Non of the redacted pages were deleted, it just didn't delete all of the other pages.

Report · May 08, 2024

Sorry, there was a small mistake in the code. I fixed it above. Try it again with the new code.

Report · May 08, 2024

That did it. Thank you!

Delete Pages By Text Search

Photos