Skip to main content
Participant
August 16, 2023
Question

PDF to Word Doc Extraction (Tech Comparison?)

  • August 16, 2023
  • 0 replies
  • 138 views

Question: For converting Technical PDFs to MS Word, is there some kind of comparison between different extraction methods?

My pdfs are complex: text, tables, images, not always well formatted.

The extraction methods:

- MS Word--Import PDF (Local)   

- Acrobat Pro -- Export PDF to Word (Local)  

- Adobe extraction API (Cloud)

 

I work with proprietary/ITAR rated documents--so Cloud-based conversion is probably not an option.

Any opinions are welcome as python/unix-based extraction methods (PyPDF/PdfToText/Other) don't capture inline tables...not well. And converting tables to images and then OCR'ing them with machine learning....that's just terrible.

Once the extraction is in Word, all text, table, and image objects are (more) extractable

For me, all this eventually gets stored in a dataframe.

 

Much appreciation in advance

 

This topic has been closed for replies.