• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Gross inaccuracies in converting PDF to Word

New Here ,
Jul 11, 2020 Jul 11, 2020

Copy link to clipboard

Copied

I have about 200 typrwritten pages that I want to convert to Word and then insert one-by-one in a book I am writing.  Sometimes the conversion is accurate with only spacing incorrect.  Sometimes about 1/3 is inaccurate (e.g., 'Z' for 'l', '.'inserted in every third word.  Sometimes the export is total garbage.  Also, I cannot change the font or margins on any exported page.  Is this the best one can expect from Export PDF?

I am using Windows 8.1.  Does this make a difference?  Is there anything I should be doing differently?

 

TOPICS
Edit and convert PDFs

Views

2.8K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Jul 11, 2020 Jul 11, 2020

Copy link to clipboard

Copied

It is very possible that what you are converting are pages that were scanned into PDF format and aren't text at all. Based on your description, it would appear that what is in the PDF file are raster images, not actual text, and that those images are what are being converted to Word, not actual text. You posted this in the Acrobat Reader community. If you actually have Acrobat (Standard or Pro), there is an OCR feature that can convert the image into text which you can do prior to export. Note however, that depending on the quality of the original “typewritten” pages and/or the scanning of same, there may be some inaccuracies.

 

No, Windows 8.1 isn't the issue here!

 

- Dov Isaacs, former Adobe Principal Scientist (April 30, 1990 - May 30, 2021)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jul 12, 2020 Jul 12, 2020

Copy link to clipboard

Copied

Thank you for such a prompt reply.  I personally typed all those pages on an IBM Selectric or a Panasonic E-2020 electronic typewriter using a Courier 10 type ball.  Aren't they text?    The originals are clear and clean, as are the PDFs.   I have Acrobat Reader DC and pay annually for the Export capability.  As it stands now, I can retype anew in Word faster than I can make all the corrections necessary in the exported version of the PDF.   But I shudder at the thought of having to retype 200 pages!!  Do you have any other suggestions?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Jul 12, 2020 Jul 12, 2020

Copy link to clipboard

Copied

LATEST

OK. That answers the question. The original typed  pages were scanned; they weren't computer-based text. I assume that you used some scanner that directly produces PDF and that you opened up the the resultant PDF file in Acroba Reader. Depending upon your scanner's software, the PDF may be (1) simply raster images, (2) text based on the scanner doing OCR (Optical Character Recognition), or (3) a hidden layer of such OCR'd text beneath a raster image representation. If the first case was true, the PDF export to Word may be configured to do OCR as best it can but depending upon the quality of the scan (including scan resolution, contrast, etc.), text size, font, etc., certain characters may not be correctly interpreted (such as an ‘l’ versus a ‘1’ (a one).

 

Regrettably, that is why “typing” then scan/OCR is never, ever as reliable as originally typing into an electronic document, such as a Word document.

 

I don't know what you used to scan the pages, but the only other suggestion would be to try rescanning at a higher resolution (at least 600dpi if not 1200dpi) with a higher quality professional scanner, possibly one that does OCR internally as it creates a PDF file.

 

 

- Dov Isaacs, former Adobe Principal Scientist (April 30, 1990 - May 30, 2021)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines