PDF to Word Doc Extraction (Tech Comparison?)

Forum|Forum|2 years ago
August 16, 2023
0 replies
138 views

Question: For converting Technical PDFs to MS Word, is there some kind of comparison between different extraction methods?

My pdfs are complex: text, tables, images, not always well formatted.

The extraction methods:

- MS Word--Import PDF (Local)

- Acrobat Pro -- Export PDF to Word (Local)

- Adobe extraction API (Cloud)

I work with proprietary/ITAR rated documents--so Cloud-based conversion is probably not an option.

Any opinions are welcome as python/unix-based extraction methods (PyPDF/PdfToText/Other) don't capture inline tables...not well. And converting tables to images and then OCR'ing them with machine learning....that's just terrible.

Once the extraction is in Word, all text, table, and image objects are (more) extractable

For me, all this eventually gets stored in a dataframe.

Much appreciation in advance

This topic has been closed for replies.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded