Copy link to clipboard
Copied
I have thousands of academic papers with many comments and text highlights made with Adobe Acrobat. Now I want to export those into a Word/Excel file, so that I have an overview of all the comments and highlighted text that I have made in all those years reading these papers.
I have not found a way to do that, not even for a single PDF document, and also not for a whole folder of PDFs.
Copy link to clipboard
Copied
Edit: I managed to do it for the text comments yesterday using ChatGPT and Python, but the highlights are missing and I couldn't get ChatGPT to make it work. This is the code that works for the comments, maybe anyone knows if there is just a simple tweak needed to make it work for highlights as well.
import os
import re
import docx
from PyPDF2 import PdfReader
def get_comments(file_path):
comments = []
with open(file_path, 'rb') as f:
pdf = PdfReader(f)
for i in range(0, len(pdf.pages)):
page = pdf.pages[i]
try:
for annot in page['/Annots']:
annot_obj = annot.get_object()
if annot_obj['/Subtype'] == '/Text':
comments.append(annot_obj['/Contents'])
except KeyError:
pass
return comments
def main():
# Get all PDF files in current directory
pdf_files = [f for f in os.listdir('.') if f.endswith('.pdf')]
# Create new Word document
doc = docx.Document()
# Loop through PDF files
for pdf_file in pdf_files:
# Get comments from PDF file
comments = get_comments(pdf_file)
# Add title to document
title = re.sub(r'\.pdf$', '', pdf_file)
doc.add_heading(title, level=1)
# Add comments to document
for comment in comments:
doc.add_paragraph(comment)
# Save document
doc.save('output.docx')
if __name__ == '__main__':
main()
Copy link to clipboard
Copied
Did you enable the option to copy the selected text into the comments themselves?
If not, it's much more complicated. You would need to do it retroactively, and then you'll be able to generate a comments summary. For example, you can use this (paid-for) tool I've developed to do it:
https://www.try67.com/tool/acrobat-retroactively-copy-highlighted-text-into-comments
If you want to create a summary of multiple files you would need to use a script as a part of an Action in Acrobat (or via a stand-alone tool) to process all the files, copy the contents of the comments from them, and then generate a single output file when done.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now