Copy link to clipboard
Copied
In trying to work with data from several PDF files, I subscribed to Adobes's online facility to convert source.pdf to output.docx. Output.docx is missing more than a page of data I need. I don't understand why this is happening.
I tried then to convert source.pdf to output.xls and the data in output.xls was repeating itself, for example:. This date value 04/22 was converted to:
04/2204/2204/2204/2204/22 |
Could any one here help me with these issues?
Copy link to clipboard
Copied
Hi rivalg,
Do you know how that PDF was created? The quality of the conversion depends largely on the quality of the PDF. It sounds as though you may be experiencing some font-related issues. As a test, you can try converting the file from within Adobe Reader with OCR disabled, as described here: How to disable Optical Character Recognition
Note, however, if the PDF was created from a scanned document, and doesn't contain selectable text, you won't be able to select that text in the converted Word document. (OCR converts scanned/image text to selectable/editiable text). Nonetheless, it's a good test.
As for the .xls output, it sounds as though the PDF isn't tagged to show the various cells in the table. Again, it depends on what created the PDF. Not all PDFs are created equally, alas.
Please try converting with OCR disabled, and see how that goes. If we need to, we can take it from there.
Best,
Sara
Copy link to clipboard
Copied
HI Sara -
The PDF was created by a financial institution and downloaded to my OS X 10.9.4 device; using Adobe Reader XI V 11.0.7 - to access this file. Converter worked fine with the first file: I got all the data but failed but, with the second on I did not. Both files are from the same source. I will try to disable the OCR.
Thanks
Copy link to clipboard
Copied
Sara,
I converted the file after disabling the OCR - issue was not resolved.
R.
Copy link to clipboard
Copied
Sara,
FYI - using another PDF-Word converter I found in the net, it converted all of the data from same file your converter missed. However, I can't use it because its output.docx has all data protected: I can't access the data I need to work with.
R.