Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

PDF to Word Doc Extraction (Tech Comparison?)

New Here ,
Aug 16, 2023 Aug 16, 2023

Question: For converting Technical PDFs to MS Word, is there some kind of comparison between different extraction methods?

My pdfs are complex: text, tables, images, not always well formatted.

The extraction methods:

- MS Word--Import PDF (Local)   

- Acrobat Pro -- Export PDF to Word (Local)  

- Adobe extraction API (Cloud)

 

I work with proprietary/ITAR rated documents--so Cloud-based conversion is probably not an option.

Any opinions are welcome as python/unix-based extraction methods (PyPDF/PdfToText/Other) don't capture inline tables...not well. And converting tables to images and then OCR'ing them with machine learning....that's just terrible.

Once the extraction is in Word, all text, table, and image objects are (more) extractable

For me, all this eventually gets stored in a dataframe.

 

Much appreciation in advance

 

TOPICS
Acrobat SDK and JavaScript , Mac , Windows
134
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
no replies

Have something to add?

Join the conversation