Skip to main content
Known Participant
July 27, 2014
Question

Why a docx converted from a PDF file did not convert 1+ page from the source file?

  • July 27, 2014
  • 1 reply
  • 875 views

In trying to work with data from several PDF files, I subscribed to Adobes's online facility to convert source.pdf to output.docx. Output.docx is missing more than a page of data I need. I don't understand why this is happening.

I tried then to convert source.pdf to output.xls and the data in output.xls was repeating itself, for example:. This date value  04/22 was converted to:

  

04/2204/2204/2204/2204/22


Could any one here help me with these issues?

    This topic has been closed for replies.

    1 reply

    Inspiring
    July 27, 2014

    Hi rivalg,

    Do you know how that PDF was created? The quality of the conversion depends largely on the quality of the PDF. It sounds as though you may be experiencing some font-related issues. As a test, you can try converting the file from within Adobe Reader with OCR disabled, as described here: How to disable Optical Character Recognition

    Note, however, if the PDF was created from a scanned document, and doesn't contain selectable text, you won't be able to select that text in the converted Word document. (OCR converts scanned/image text to selectable/editiable text). Nonetheless, it's a good test.

    As for the .xls output, it sounds as though the PDF isn't tagged to show the various cells in the table. Again, it depends on what created the PDF. Not all PDFs are created equally, alas.


    Please try converting with OCR disabled, and see how that goes. If we need to, we can take it from there.


    Best,

    Sara

    rivalgAuthor
    Known Participant
    July 27, 2014

    HI Sara -

    The PDF was created by a financial institution and downloaded to my OS X 10.9.4 device; using Adobe Reader XI V 11.0.7 - to access this file. Converter worked fine with the first file: I got all the data but failed but, with the second on I did not. Both files are from the same source. I will try to disable the OCR.

    Thanks