Skip to main content
New Participant
January 18, 2019
Question

How to convert PDF documents into .txt files?

  • January 18, 2019
  • 3 replies
  • 18148 views

Hi,

I am working on a research project in Machine Learning, where my dataset is a large collection of PDF files. I need to convert these PDF files into a format which I can use as input in a Text Classification model, such as .txt files. I have been unable to find a tool which does this well, and would be delighted if I could be pointed in the right direction.

Thanks!

This topic has been closed for replies.

3 replies

New Participant
October 6, 2019

I use the SODA PDF PREMIUM for this conversion. I am able to convert the pdf print file over 100 pages (does not work with pdf scan file).

Must use the PREMIUM versiopn do this conversion.

You can down load the try version and see if this is applicable to your files.

Brainiac
January 19, 2019

The chances are many tools will do a similar job, and the limitation is in the PDF itself.

If the issue is word separators this might change.

If it is no text coming out, or gobbledegook, nothing will fix it.

Retyping might be a large and time consuming part of your project.

New Participant
January 20, 2019

Thank you for clearing that up!

try67
Adobe Expert
January 21, 2019

If you had Acrobat then you could export the pages as images, then create a new PDF file from those images and run Recognize Text on it. If the results are good you'll end up with a document that has readable text and that can be exported to a text file...

Brainiac
January 18, 2019

I doubt any tool will do it "well" unless you are very lucky, because PDFs don't always convert or even contain text. Do you have any paid-for Adobe services or products related to PDF?

New Participant
January 19, 2019

I do not, but I wanted some advice as to whether there were any paid-for services which work relatively well, and if there was an option to test them out before paying for them. The PDFs in question are all research reports, so all of them contain significant amount of text, and as you rightly pointed out, they have not been converting well using the free tools available online.

jane-e
Adobe Expert
January 19, 2019

Hi,

You can get a free trial to Acrobat Pro to see if it works

Download Adobe Acrobat free trial | Acrobat Pro DC

And you can subscribe for a month for $25. Note that there is a lower price for annual, paid monthly.

Plans and pricing | Adobe Acrobat DC

Part of whether it works depends on how the PDF was made. Adobe created PDF, but gave it away and PDFs are now made by lots of vendors other than Adobe. Some are poorly made.

Using Acrobat and a well made PDF, you can also convert to Word and Excel.