• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
1

Export highlights from a PDF file

Community Beginner ,
Apr 10, 2023 Apr 10, 2023

Copy link to clipboard

Copied

Is there really no way to export or extract highlighted annotations of a PDF file as a simple text file? I find it absolutely ridiculous that there is no simple way to do this despite PDFs being around for decades! I am using Acrobat Pro and still no easy way to do this.

TOPICS
Create PDFs

Views

5.8K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2023 Apr 10, 2023

Copy link to clipboard

Copied

Correct. You can create a comment summary in Acorbat, but that's the best it has.

 

Bluebeam Revu will let you export comments as a CSV file.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2023 Apr 10, 2023

Copy link to clipboard

Copied

Hi,

Last year, I wrote a script that you might be interested in!

Change the .txt extesion of the attached file in .js then place this file into the JavaScript folder of your Acrobat then restat the application.

You will get a new "* b2Tools *" item in your "Edit" menu.

Capture_d’écran_2023-04-10_à_18_46_36.png
Select "Comments Summary"...

Capture d’écran 2023-04-10 à 18.46.51.png

Choose what you want, then "OK".

Try it and let me know...
@+

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 10, 2023 Apr 10, 2023

Copy link to clipboard

Copied

Thank you heaps but sorry, I just don't understand where or how should I run this script.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 11, 2023 Apr 11, 2023

Copy link to clipboard

Copied

Hi,

After changing the file extension from .txt to .js (from b2T-Comments report.txt to b2T-Comments report.js), you must place this file into the JavaScript folder of your Acrobat application.

If you don't know where is this folder, you can use the attached "Show_me_the_path.pdf" file which will help you to find it.

Then you will have to restart your Acrobat application then follow previous indication which should answer to your need.

Capture d’écran 2023-04-11 à 18.54.12.pngCapture d’écran 2023-04-11 à 18.54.46.png

Let me know.

@+

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 11, 2023 Apr 11, 2023

Copy link to clipboard

Copied

Thanks very much for freely sharing your script. I was able to run the script displayed in Adobe Acrobat DC which is great. However, there are still a couple of issues:

Issue 1 (minor) - It does NOT generate a simple text file with the highlighted text. It generates only either a PDF, OR a console window with the highlighted text. 

Issue 2 (major) - The highlighted text that is extracted to the PDF or the console window is wrapped between extra unwanted information like date/time/page/paragraph/username/colour of the highlight etc. I had 21 highlighted comments and each comment is sandwiched between extra unwanted information. So I have to manually copy paste each extracted comment or go around manually deleting the unwanted infom. This takes the same amount of time as manually copy-pasting each comment directly from the orginal PDF.

I am simply after a way to extract all the highlighted text into a textfile, clean and tidy, with no extra information.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 12, 2023 Apr 12, 2023

Copy link to clipboard

Copied

Hi,

I'm sorry if my utility is not exactly what you expected, but it was developed for another request and it took hours of programming.
If you only need to extract 21 comments, I think that will take less time to do that manually than to develop a similar utility adapted to your request.

FYI, I don't think this utility generates the pdf file and the display in the console without generating the txt file. You certainly don't know where find it. You should find it in the Attachment panel.

Capture_d’écran_2023-04-12_à_10_02_26.png

@+

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 12, 2023 Apr 12, 2023

Copy link to clipboard

Copied

  • Screen-shot below displays how it gives an error message if I unselect PDF and console options and have only the text file option selected.

Grasshopper_nz_0-1681290393334.png

 

  • This below screen-shot is after generating the extraction into the console. The 'attachment' section in Acrobat simply does not display a textfile at my end unfortunately.Grasshopper_nz_1-1681290538914.png

     

  • Oh no, this is not just for 1 PDF file! I have just started a PhD study and I will have in excess of 300 PDF files minimum and each PDF with up to 30 highlights. It would be really useful to have a utility where one can extract just the highlights as text with no metadata information as such.


I do appreciate the time you've invested in making this programme and for your detailed responses; thank you very much!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 12, 2023 Apr 12, 2023

Copy link to clipboard

Copied

That's effectively a bug... I will have a look on my script then I'll come back to you!

@+

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 12, 2023 Apr 12, 2023

Copy link to clipboard

Copied

Thanks!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 12, 2023 Apr 12, 2023

Copy link to clipboard

Copied

In fact, that was not a bug but a demand for only import the txt file when the new pdf summary file is generated.

I've just done a revision to allow the txt file attached to the actual pdf file with or without saving.

Capture d’écran 2023-04-12 à 21.00.48.png

But this revision (attached) give all previous information for each comment.

Else, I've also just written the script below you can run as an action wizzard which will only extract the highlighted text.

var version="04/23";
// Début durée
d0=new Date();
debut=util.printd("dd/mm/yyyy à HH:MM",d0);
// C'est parti !
console.show();
console.clear();
var lesTirets="––––––––––––––";
var lesProprietes=["quads","contents"];
var possible=1;
var highlightedPage=new Array(this.numPages);
this.syncAnnotScan();
var annots=this.getAnnots();
if (annots!=null) {
	var cT=0;
	for (var i=0; i<annots.length; i++) {
		if (annots[i].type=="Highlight" || annots[i].type=="Underline" || annots[i].type=="Squiggly" || annots[i].type=="StrikeOut" || annots[i].type=="Redact") {
			if (annots[i].type!="StrikeOut" && !possible) possible=1;
			var laPage=annots[i].page;
			if (typeof highlightedPage[laPage]==="undefined") highlightedPage[laPage]=new Array();
			highlightedPage[laPage].push(i.toString());
			for (var prop=0; prop<lesProprietes.length; prop++) {
				if (typeof eval("annots[i]."+lesProprietes[prop])=="string" || lesProprietes[prop]=="quads") {
					highlightedPage[laPage].push(eval("annots[i]."+lesProprietes[prop]));
				}
			}
			highlightedPage[laPage].push("-");
		}
	}
	var incr=lesProprietes.length+2; // 1 pour N° de page + 1 pour AV/AP
	for (var i=highlightedPage.length-1; i>=0; i--) {
		if (typeof highlightedPage[i]==="undefined") {
			highlightedPage.splice(i,1);
		} else {
			highlightedPage[i].unshift(i);
		}
	}
	reponses=highlightedPage.slice(0);
	for (var j=0; j<reponses.length; j++) {
		reponses[j]=highlightedPage[j].slice(0);
		for (k=2; k<reponses[j].length; k++) reponses[j][k]=highlightedPage[j][k].slice(0);
	}
	for (var j=0; j<reponses.length; j++) {
		for (k=2; k<reponses[j].length; k+=incr) reponses[j][k]=[];
	}
	//
	for (var j=0; j<highlightedPage.length; j++) {
		var p=highlightedPage[j][0];
		console.clear();
		console.println("D\Process starting: "+debut);
		console.println(lesTirets);
		console.println("Processing page "+(p+1));
		// Y maxi et mini dans la page
		var max=[];
		var min=[];
		for (k=2; k<highlightedPage[j].length; k+=incr) {
			r=highlightedPage[j][k][0];
			r=r.toString();
			r=r.split(",");
			max.push(r[1]);
			min.push(r[7]);
		}
		max.sort(function(a,b){return b-a});
		min.sort(function(a,b){return a-b});
		var yMax=Number(max[0]);
		var yMin=Number(min[0]);
		// Vérification des mots
		var nbMots=this.getPageNumWords(p);
		var mT=0;
		for (var i=0; i<nbMots; i++) {
			var leMot=this.getPageNthWord(p,i,true);
			var q=this.getPageNthWordQuads(p,i);
			m=(new Matrix2D).fromRotated(this,p);
			mInv=m.invert();
			r=mInv.transform(q);
			r=r.toString();
			r=r.split(",");
			var xGmot=Number(r[0]);
			var yGmot=Number(r[1]);
			var xDmot=Number(r[6]);
			var yDmot=Number(r[7]);
			if (yGmot>yMax+1) continue;
			else if (yGmot<yMin-1 && mT) break;
			else {
				for (k=2; k<highlightedPage[j].length; k+=incr) {
					for (m=0; m<highlightedPage[j][k].length; m++) {
						r=highlightedPage[j][k][m];
						r=r.toString();
						r=r.split(",");
						var xG=Number(r[0]);
						var yG=Number(r[1]);
						var xD=Number(r[6]);
						var yD=Number(r[7]);
						if (xGmot>xG-1 && yGmot<yG+1 && xGmot<xD && yDmot>yD-1) {
							mT++;
							reponses[j][k].push(this.getPageNthWord(p,i,false));
						}
					}
				}
			}
		}
	}
	console.clear();
	console.println("Process starting: "+debut);
	console.println(lesTirets);
	console.println("Building the result");
	var leTexte="";
	for (var j=0; j<reponses.length; j++) {
		var surPage=Math.floor((reponses[j].length-1)/incr)+cT;
		var texteChamp="";
		// Page
		if (leTexte!="") {
			leTexte+="\r";
			texteChamp+="\r";
		}
		for (k=2; k<reponses[j].length; k+=incr) {
			var lesMots=reponses[j][k].toString();
			var lesMots=lesMots.replace(/^\s+|\s+$/,"");
			var lesMots=lesMots.replace(/ ,/g," ");
			var lesMots=lesMots.replace(/-,/g,"-");
			var lesMots=lesMots.replace(/\(,/g,"\(");
			var lesMots=lesMots.replace(/\",/g,"\"");
			var lesMots=lesMots.replace(/\[,/g,"\[");
			var lesMots=lesMots.replace(/\n,/g,"\n");
			var lesMots=lesMots.replace(/¡,/g,"¡");
			var lesMots=lesMots.replace(/¿,/g,"¿");
			var adjectif=""; // Redact
			// Texte
			leTexte+="\r"+lesMots+"";
			// Commentaire
			var laReponse=reponses[j][k+1];
			leTexte+="\r";
		}
	}
	// Fin durée
	console.clear();
	console.println("Process starting: "+debut);
	df=new Date();
	fin=util.printd("dd/mm/yyyy à HH:MM",df);
	console.println("Process ending: "+fin);
	temps=(df.valueOf()-d0.valueOf())/1000/60;
	var lesMinutes=parseInt(temps);
	var lesSecondes=(temps-lesMinutes)*60;
	var lesSecondes=parseInt(lesSecondes*10)/10;
	var leTemps="";
	if (lesMinutes>0) {
		if (lesMinutes==1) {
			var leTemps="1 minute";
		} else {
			var leTemps=lesMinutes+"minutes";
		}
	}
	if (lesSecondes>0) {
		if (lesSecondes<2) {
			var leTemps=leTemps+" "+lesSecondes+" second";
		} else {
			var leTemps=leTemps+" "+lesSecondes+" seconds";
		}
	}
	var leTemps=leTemps.replace(/^\s+|\s+$/gm,"");
	if (leTemps.length>0) {
		console.println("Process duration: "+leTemps+"\r\r");
	}
	console.println(leTexte);
	var leFichier="Comments of "+util.printd("dd-mm-yy - HH:MM", new Date()).replace(/:/,"h");
	var leRapport=leFichier+".txt";
	this.createDataObject(leRapport, "©™Σ","text/html; charset=utf-16"); //
	var oFile=util.streamFromString(leTexte);
	this.setDataObjectContents(leRapport, oFile);
	// Message final
	var ouverture="You can import the attached .txt file into a spreadsheet using Unicode UTF-8 format.";
	if (annots.length-cT==1) app.alert("One comment has been detailed.\r\r"+ouverture,3);
	else app.alert((annots.length-cT)+" comments have been detailed.\r\r"+ouverture,3);
}
if (annots==null) app.alert("There are no comments in this document.",3)

Capture_d’écran_2023-04-12_à_21_09_22.png

This script is extracted from the utility and maybe some lines are not useful...

Let me know if you don't know how to use action wizzards.

@+

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2023 Apr 10, 2023

Copy link to clipboard

Copied

It can be done using a script, like this (paid-for) tool I've developed many years ago, exactly for this purpose: http://try67.blogspot.com/2008/11/acrobat-create-comments-summary-txt-pdf.html

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 10, 2023 Apr 10, 2023

Copy link to clipboard

Copied

Cheers for that.

 

To be honest, after obtaining the professional Adobe Acrobat DC version I am not really inclined to pay more. This is a very  simple function I'd expect that Adobe provided to its clients - literrally tens of thousands of people (e.g., all in academia) will benefit from this function.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2023 Apr 10, 2023

Copy link to clipboard

Copied

What exactly would you want to be in this simple text file? Can you give an example of what it might look like? 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 10, 2023 Apr 10, 2023

Copy link to clipboard

Copied

If I highlight three different sentences in the text of the PDF (as a comment or annotation), I just want to export this comment/annotation so I could use it have it saved elsewhere as study-notes instead of having to open every single PDF file looking for the highlighted components.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 11, 2023 Apr 11, 2023

Copy link to clipboard

Copied

Can you elaborate? Are you asking to have the text under the highlight exported or the content of the annotation? Do you need page numbers? Your request isn't clear enough to take action.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 11, 2023 Apr 11, 2023

Copy link to clipboard

Copied

I only want the highlighted text to be extracted into a clean textfile. Nothing else.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 06, 2024 May 06, 2024

Copy link to clipboard

Copied

LATEST

It was like a nightmare for me to extract highlighted texts from pdf files.. I tried making it with code but later on discovered readoku.com   where you can export highlights into word, excel, json and csv file formats. In case anyone still searching a time saving way.. 

IMG_2973.jpeg

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines