Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
0

Delete Pages By Text Search

Community Beginner ,
Mar 15, 2023 Mar 15, 2023

Copy link to clipboard

Copied

Hello everyone, everything good? I have a document that I need to separate from different classes, for example, it is a document with 200 pages mixed up. I have the list of names for the morning class and the list of names for the night class, so I wanted to put the list of names and delete it to be able to separate the list quickly

Example:
- Alice Says
-Francisco Campos
- etc...
- etc...


Those above I wanted the PDF to automatically find and delete, how would that be? I believe that only with correct script?

TOPICS
How to , JavaScript

Views

4.0K
Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
1 ACCEPTED SOLUTION
Community Beginner ,
May 08, 2024 May 08, 2024

Copy link to clipboard

Copied

LATEST

That did it.  Thank you!

View solution in original post

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 15, 2023 Mar 15, 2023

Copy link to clipboard

Copied

You are correct that is can only be done with a script. However, it's not particularly easy. 

I would suggest doing this.

 

  1. Create a new 'Action' with the Action Wizard,
  2.  Add "Search and Remove Text", from the "Protection" category. This is where you enter your list of names. This step will mark the entered names with a redact annotations. It won't remove the text. 
  3. Add "Execute JavaScript", from the   "More Tools" category.  Then enter the following script to find the page numbers for all the redact annotations and then remove them. 

 

(getAnnots() || []).filter(a=>a.type = "Redact").map(a=>a.page).sort().filter((ele,i,arr)=> arr.indexOf(ele) == i).reverse().forEach(a=>this.deletePages(a))

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 15, 2023 Mar 15, 2023

Copy link to clipboard

Copied

Hello Thom, how are you? Thank you immensely for the answer, your method has really helped me a lot, the question of the first step of the script which is the search and remove text is working 100%, the part of the Script has some pages that are marked that it is not deleting for some reason

In case you wanted to test with the file I'm using and see if that's right I'll send you the file and the list of names for you to check if your script will delete 100%

But even so, he already gave me an 80% help to facilitate this part, thanks again!

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 15, 2023 Mar 15, 2023

Copy link to clipboard

Copied

Please look in the console window (Ctrl-J) and see if there are any errors reported there. 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 15, 2023 Mar 15, 2023

Copy link to clipboard

Copied

Arg!! I missed a piece. The sort function is preforming a lexical sort on the page numbers, rather than a numerical sort.  

Here's the corrected code. I also included explicit refererences to the doc object on which the script is operating

(event.target.getAnnots() || []).filter(a=>a.type = "Redact").map(a=>a.page).sort((a,b)=>a-b).filter((ele,i,arr)=> arr.indexOf(ele) == i).reverse().forEach(a=>event.target.deletePages(a))

 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 16, 2023 Mar 16, 2023

Copy link to clipboard

Copied

Does this code take into account the situation where there are multiple results on one page? It needs to be filtered to only unique values.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 16, 2023 Mar 16, 2023

Copy link to clipboard

Copied

Yes, it filters out multiple hits on a page. And I tested it and it works.

The error you are seeing (invalid input to deletePages) is from pages not being properly sorted, so the pages are deleted out of order, resulting in a later page being hit after earlier pages are removed. 

This error is completely unrelated to using the equality comparison. Changing it to the identity comparison doesn't do anything. You saw this error becuase the code was not updated, not because there was a problem with the code. You can test this by changing it back and restarting Acrobat. 

 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 16, 2023 Mar 16, 2023

Copy link to clipboard

Copied

Hi Thom, how are you? I tested this code but it still gives error

CleberRafael_0-1678971179283.pngexpand image

 



Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 16, 2023 Mar 16, 2023

Copy link to clipboard

Copied

Thom, I believe I managed to correct the code

Here is the correct code:

(event.target.getAnnots() || []).filter(a => a.type === "Redact")
.map(a => a.page)
.sort((a, b) => a - b)
.filter((ele, i, arr) => arr.indexOf(ele) === i)
.reverse()
.forEach(a => event.target.deletePages(a));

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 07, 2024 May 07, 2024

Copy link to clipboard

Copied

I want to do a similar thing, but instead of deleting the pages I want to extract them as one separate file.  I changed the last line of code from "deletePages" to "extractPages."  However, it creates a separate file for each page that's extracted and I just need one file for all of them.  BTW I do not know ANYTHING about Java script.  -Thanks!

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 07, 2024 May 07, 2024

Copy link to clipboard

Copied

An easier way to do that is delete the pages which don't have the search term, and then you'll end up with a document that has all the pages that has them. Then you can save it under a new name, or print it, etc.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 07, 2024 May 07, 2024

Copy link to clipboard

Copied

So how do I change the code to delete the pages that don't have the search term?

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 07, 2024 May 07, 2024

Copy link to clipboard

Copied

I like to use a different kind of code style, so I re-wrote it entirely:

 

 

this.syncAnnotScan();
for (var p=this.numPages-1; p>=0; p--) {
	var matchFound = false;
	var annots = this.getAnnots({nPage: p});
	if (annots!=null) {
		for (var i=annots.length-1; i>=0; i--) {
			var annot = annots[i];
			if (annot.type=="Redact") {
				matchFound = true;
				annot.destroy();			
			}
		}
	}
	if (!matchFound) {
		if (this.numPages==1) app.alert("Error! No matches were found in the entire file.");
		else this.deletePages(p, p);
	}
}

 

 

The code above will also remove the Redaction comments, so they don't appear in the final outcome.

 

Edited: Code fixed

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 07, 2024 May 07, 2024

Copy link to clipboard

Copied

That's great, it worked!  Thanks!

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 08, 2024 May 08, 2024

Copy link to clipboard

Copied

After some other work I went back and tried the script on my file and it didn’t work right.  It deleted a few pages, but others were still there even though they did not have the criteria I redacted.  This is about a 2000 page file with about 500 pages with criteria I searched for.  I tried it on a sample file yesterday thinkin it worked with just about 150 pages the redacted parts were just on the first few page.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2024 May 08, 2024

Copy link to clipboard

Copied

Are the results of the Search & Remove Text command correct, though, before running the script?

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 08, 2024 May 08, 2024

Copy link to clipboard

Copied

Yes, after the Search & Remove Text command ran it redacted the correct pages and should have deleted about 1500 pages, but it just deleted 18 pages.  Non of the redacted pages were deleted, it just didn't delete all of the other pages.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2024 May 08, 2024

Copy link to clipboard

Copied

Sorry, there was a small mistake in the code. I fixed it above. Try it again with the new code.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 08, 2024 May 08, 2024

Copy link to clipboard

Copied

LATEST

That did it.  Thank you!

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines