Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

How to export all highlights and comments from multiple PDFs into a Word/Excel file?

New Here ,
Apr 21, 2023 Apr 21, 2023

I have thousands of academic papers with many comments and text highlights made with Adobe Acrobat. Now I want to export those into a Word/Excel file, so that I have an overview of all the comments and highlighted text that I have made in all those years reading these papers.

 

I have not found a way to do that, not even for a single PDF document, and also not for a whole folder of PDFs.

TOPICS
How to
2.2K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 21, 2023 Apr 21, 2023

Edit: I managed to do it for the text comments yesterday using ChatGPT and Python, but the highlights are missing and I couldn't get ChatGPT to make it work. This is the code that works for the comments, maybe anyone knows if there is just a simple tweak needed to make it work for highlights as well.

 

import os
import re
import docx
from PyPDF2 import PdfReader

def get_comments(file_path):
comments = []
with open(file_path, 'rb') as f:
pdf = PdfReader(f)
for i in range(0, len(pdf.pages)):
page = pdf.pages[i]
try:
for annot in page['/Annots']:
annot_obj = annot.get_object()
if annot_obj['/Subtype'] == '/Text':
comments.append(annot_obj['/Contents'])
except KeyError:
pass
return comments

def main():
# Get all PDF files in current directory
pdf_files = [f for f in os.listdir('.') if f.endswith('.pdf')]

# Create new Word document
doc = docx.Document()

# Loop through PDF files
for pdf_file in pdf_files:
# Get comments from PDF file
comments = get_comments(pdf_file)

# Add title to document
title = re.sub(r'\.pdf$', '', pdf_file)
doc.add_heading(title, level=1)

# Add comments to document
for comment in comments:
doc.add_paragraph(comment)

# Save document
doc.save('output.docx')

if __name__ == '__main__':
main()

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 21, 2023 Apr 21, 2023
LATEST

Did you enable the option to copy the selected text into the comments themselves?

If not, it's much more complicated. You would need to do it retroactively, and then you'll be able to generate a comments summary. For example, you can use this (paid-for) tool I've developed to do it:

https://www.try67.com/tool/acrobat-retroactively-copy-highlighted-text-into-comments

If you want to create a summary of multiple files you would need to use a script as a part of an Action in Acrobat (or via a stand-alone tool) to process all the files, copy the contents of the comments from them, and then generate a single output file when done.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines