Copy link to clipboard
Copied
Is there really no way to export or extract highlighted annotations of a PDF file as a simple text file? I find it absolutely ridiculous that there is no simple way to do this despite PDFs being around for decades! I am using Acrobat Pro and still no easy way to do this.
Copy link to clipboard
Copied
Correct. You can create a comment summary in Acorbat, but that's the best it has.
Bluebeam Revu will let you export comments as a CSV file.
Copy link to clipboard
Copied
Hi,
Last year, I wrote a script that you might be interested in!
Change the .txt extesion of the attached file in .js then place this file into the JavaScript folder of your Acrobat then restat the application.
You will get a new "* b2Tools *" item in your "Edit" menu.
Select "Comments Summary"...
Choose what you want, then "OK".
Try it and let me know...
@+
Copy link to clipboard
Copied
Thank you heaps but sorry, I just don't understand where or how should I run this script.
Copy link to clipboard
Copied
Hi,
After changing the file extension from .txt to .js (from b2T-Comments report.txt to b2T-Comments report.js), you must place this file into the JavaScript folder of your Acrobat application.
If you don't know where is this folder, you can use the attached "Show_me_the_path.pdf" file which will help you to find it.
Then you will have to restart your Acrobat application then follow previous indication which should answer to your need.
Let me know.
@+
Copy link to clipboard
Copied
Thanks very much for freely sharing your script. I was able to run the script displayed in Adobe Acrobat DC which is great. However, there are still a couple of issues:
Issue 1 (minor) - It does NOT generate a simple text file with the highlighted text. It generates only either a PDF, OR a console window with the highlighted text.
Issue 2 (major) - The highlighted text that is extracted to the PDF or the console window is wrapped between extra unwanted information like date/time/page/paragraph/username/colour of the highlight etc. I had 21 highlighted comments and each comment is sandwiched between extra unwanted information. So I have to manually copy paste each extracted comment or go around manually deleting the unwanted infom. This takes the same amount of time as manually copy-pasting each comment directly from the orginal PDF.
I am simply after a way to extract all the highlighted text into a textfile, clean and tidy, with no extra information.
Copy link to clipboard
Copied
Hi,
I'm sorry if my utility is not exactly what you expected, but it was developed for another request and it took hours of programming.
If you only need to extract 21 comments, I think that will take less time to do that manually than to develop a similar utility adapted to your request.
FYI, I don't think this utility generates the pdf file and the display in the console without generating the txt file. You certainly don't know where find it. You should find it in the Attachment panel.
@+
Copy link to clipboard
Copied
I do appreciate the time you've invested in making this programme and for your detailed responses; thank you very much!
Copy link to clipboard
Copied
That's effectively a bug... I will have a look on my script then I'll come back to you!
@+
Copy link to clipboard
Copied
Thanks!
Copy link to clipboard
Copied
In fact, that was not a bug but a demand for only import the txt file when the new pdf summary file is generated.
I've just done a revision to allow the txt file attached to the actual pdf file with or without saving.
But this revision (attached) give all previous information for each comment.
Else, I've also just written the script below you can run as an action wizzard which will only extract the highlighted text.
var version="04/23";
// Début durée
d0=new Date();
debut=util.printd("dd/mm/yyyy à HH:MM",d0);
// C'est parti !
console.show();
console.clear();
var lesTirets="––––––––––––––";
var lesProprietes=["quads","contents"];
var possible=1;
var highlightedPage=new Array(this.numPages);
this.syncAnnotScan();
var annots=this.getAnnots();
if (annots!=null) {
var cT=0;
for (var i=0; i<annots.length; i++) {
if (annots[i].type=="Highlight" || annots[i].type=="Underline" || annots[i].type=="Squiggly" || annots[i].type=="StrikeOut" || annots[i].type=="Redact") {
if (annots[i].type!="StrikeOut" && !possible) possible=1;
var laPage=annots[i].page;
if (typeof highlightedPage[laPage]==="undefined") highlightedPage[laPage]=new Array();
highlightedPage[laPage].push(i.toString());
for (var prop=0; prop<lesProprietes.length; prop++) {
if (typeof eval("annots[i]."+lesProprietes[prop])=="string" || lesProprietes[prop]=="quads") {
highlightedPage[laPage].push(eval("annots[i]."+lesProprietes[prop]));
}
}
highlightedPage[laPage].push("-");
}
}
var incr=lesProprietes.length+2; // 1 pour N° de page + 1 pour AV/AP
for (var i=highlightedPage.length-1; i>=0; i--) {
if (typeof highlightedPage[i]==="undefined") {
highlightedPage.splice(i,1);
} else {
highlightedPage[i].unshift(i);
}
}
reponses=highlightedPage.slice(0);
for (var j=0; j<reponses.length; j++) {
reponses[j]=highlightedPage[j].slice(0);
for (k=2; k<reponses[j].length; k++) reponses[j][k]=highlightedPage[j][k].slice(0);
}
for (var j=0; j<reponses.length; j++) {
for (k=2; k<reponses[j].length; k+=incr) reponses[j][k]=[];
}
//
for (var j=0; j<highlightedPage.length; j++) {
var p=highlightedPage[j][0];
console.clear();
console.println("D\Process starting: "+debut);
console.println(lesTirets);
console.println("Processing page "+(p+1));
// Y maxi et mini dans la page
var max=[];
var min=[];
for (k=2; k<highlightedPage[j].length; k+=incr) {
r=highlightedPage[j][k][0];
r=r.toString();
r=r.split(",");
max.push(r[1]);
min.push(r[7]);
}
max.sort(function(a,b){return b-a});
min.sort(function(a,b){return a-b});
var yMax=Number(max[0]);
var yMin=Number(min[0]);
// Vérification des mots
var nbMots=this.getPageNumWords(p);
var mT=0;
for (var i=0; i<nbMots; i++) {
var leMot=this.getPageNthWord(p,i,true);
var q=this.getPageNthWordQuads(p,i);
m=(new Matrix2D).fromRotated(this,p);
mInv=m.invert();
r=mInv.transform(q);
r=r.toString();
r=r.split(",");
var xGmot=Number(r[0]);
var yGmot=Number(r[1]);
var xDmot=Number(r[6]);
var yDmot=Number(r[7]);
if (yGmot>yMax+1) continue;
else if (yGmot<yMin-1 && mT) break;
else {
for (k=2; k<highlightedPage[j].length; k+=incr) {
for (m=0; m<highlightedPage[j][k].length; m++) {
r=highlightedPage[j][k][m];
r=r.toString();
r=r.split(",");
var xG=Number(r[0]);
var yG=Number(r[1]);
var xD=Number(r[6]);
var yD=Number(r[7]);
if (xGmot>xG-1 && yGmot<yG+1 && xGmot<xD && yDmot>yD-1) {
mT++;
reponses[j][k].push(this.getPageNthWord(p,i,false));
}
}
}
}
}
}
console.clear();
console.println("Process starting: "+debut);
console.println(lesTirets);
console.println("Building the result");
var leTexte="";
for (var j=0; j<reponses.length; j++) {
var surPage=Math.floor((reponses[j].length-1)/incr)+cT;
var texteChamp="";
// Page
if (leTexte!="") {
leTexte+="\r";
texteChamp+="\r";
}
for (k=2; k<reponses[j].length; k+=incr) {
var lesMots=reponses[j][k].toString();
var lesMots=lesMots.replace(/^\s+|\s+$/,"");
var lesMots=lesMots.replace(/ ,/g," ");
var lesMots=lesMots.replace(/-,/g,"-");
var lesMots=lesMots.replace(/\(,/g,"\(");
var lesMots=lesMots.replace(/\",/g,"\"");
var lesMots=lesMots.replace(/\[,/g,"\[");
var lesMots=lesMots.replace(/\n,/g,"\n");
var lesMots=lesMots.replace(/¡,/g,"¡");
var lesMots=lesMots.replace(/¿,/g,"¿");
var adjectif=""; // Redact
// Texte
leTexte+="\r"+lesMots+"";
// Commentaire
var laReponse=reponses[j][k+1];
leTexte+="\r";
}
}
// Fin durée
console.clear();
console.println("Process starting: "+debut);
df=new Date();
fin=util.printd("dd/mm/yyyy à HH:MM",df);
console.println("Process ending: "+fin);
temps=(df.valueOf()-d0.valueOf())/1000/60;
var lesMinutes=parseInt(temps);
var lesSecondes=(temps-lesMinutes)*60;
var lesSecondes=parseInt(lesSecondes*10)/10;
var leTemps="";
if (lesMinutes>0) {
if (lesMinutes==1) {
var leTemps="1 minute";
} else {
var leTemps=lesMinutes+"minutes";
}
}
if (lesSecondes>0) {
if (lesSecondes<2) {
var leTemps=leTemps+" "+lesSecondes+" second";
} else {
var leTemps=leTemps+" "+lesSecondes+" seconds";
}
}
var leTemps=leTemps.replace(/^\s+|\s+$/gm,"");
if (leTemps.length>0) {
console.println("Process duration: "+leTemps+"\r\r");
}
console.println(leTexte);
var leFichier="Comments of "+util.printd("dd-mm-yy - HH:MM", new Date()).replace(/:/,"h");
var leRapport=leFichier+".txt";
this.createDataObject(leRapport, "©™Σ","text/html; charset=utf-16"); //
var oFile=util.streamFromString(leTexte);
this.setDataObjectContents(leRapport, oFile);
// Message final
var ouverture="You can import the attached .txt file into a spreadsheet using Unicode UTF-8 format.";
if (annots.length-cT==1) app.alert("One comment has been detailed.\r\r"+ouverture,3);
else app.alert((annots.length-cT)+" comments have been detailed.\r\r"+ouverture,3);
}
if (annots==null) app.alert("There are no comments in this document.",3)
This script is extracted from the utility and maybe some lines are not useful...
Let me know if you don't know how to use action wizzards.
@+
Copy link to clipboard
Copied
It can be done using a script, like this (paid-for) tool I've developed many years ago, exactly for this purpose: http://try67.blogspot.com/2008/11/acrobat-create-comments-summary-txt-pdf.html
Copy link to clipboard
Copied
Cheers for that.
To be honest, after obtaining the professional Adobe Acrobat DC version I am not really inclined to pay more. This is a very simple function I'd expect that Adobe provided to its clients - literrally tens of thousands of people (e.g., all in academia) will benefit from this function.
Copy link to clipboard
Copied
What exactly would you want to be in this simple text file? Can you give an example of what it might look like?
Copy link to clipboard
Copied
If I highlight three different sentences in the text of the PDF (as a comment or annotation), I just want to export this comment/annotation so I could use it have it saved elsewhere as study-notes instead of having to open every single PDF file looking for the highlighted components.
Copy link to clipboard
Copied
Can you elaborate? Are you asking to have the text under the highlight exported or the content of the annotation? Do you need page numbers? Your request isn't clear enough to take action.
Copy link to clipboard
Copied
I only want the highlighted text to be extracted into a clean textfile. Nothing else.
Copy link to clipboard
Copied
It was like a nightmare for me to extract highlighted texts from pdf files.. I tried making it with code but later on discovered readoku.com where you can export highlights into word, excel, json and csv file formats. In case anyone still searching a time saving way..