Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Why a docx converted from a PDF file did not convert 1+ page from the source file?

Explorer ,
Jul 26, 2014 Jul 26, 2014

In trying to work with data from several PDF files, I subscribed to Adobes's online facility to convert source.pdf to output.docx. Output.docx is missing more than a page of data I need. I don't understand why this is happening.

I tried then to convert source.pdf to output.xls and the data in output.xls was repeating itself, for example:. This date value  04/22 was converted to:

  

04/2204/2204/2204/2204/22


Could any one here help me with these issues?

744
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Jul 27, 2014 Jul 27, 2014

Hi rivalg,

Do you know how that PDF was created? The quality of the conversion depends largely on the quality of the PDF. It sounds as though you may be experiencing some font-related issues. As a test, you can try converting the file from within Adobe Reader with OCR disabled, as described here: How to disable Optical Character Recognition

Note, however, if the PDF was created from a scanned document, and doesn't contain selectable text, you won't be able to select that text in the converted Word document. (OCR converts scanned/image text to selectable/editiable text). Nonetheless, it's a good test.

As for the .xls output, it sounds as though the PDF isn't tagged to show the various cells in the table. Again, it depends on what created the PDF. Not all PDFs are created equally, alas.


Please try converting with OCR disabled, and see how that goes. If we need to, we can take it from there.


Best,

Sara

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jul 27, 2014 Jul 27, 2014

HI Sara -

The PDF was created by a financial institution and downloaded to my OS X 10.9.4 device; using Adobe Reader XI V 11.0.7 - to access this file. Converter worked fine with the first file: I got all the data but failed but, with the second on I did not. Both files are from the same source. I will try to disable the OCR.

Thanks

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jul 27, 2014 Jul 27, 2014

Sara,

I converted the file after disabling the OCR - issue was not resolved.

R.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jul 27, 2014 Jul 27, 2014
LATEST

Sara,

FYI - using another PDF-Word converter I found in the net, it converted all of the data from same file your converter missed. However, I can't use it because its output.docx has all data protected: I can't access the data I need to work with.

R.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines