Highlighted

How to convert PDF documents into .txt files?

New Here ,
Jan 18, 2019

Copy link to clipboard

Copied

Hi,

I am working on a research project in Machine Learning, where my dataset is a large collection of PDF files. I need to convert these PDF files into a format which I can use as input in a Text Classification model, such as .txt files. I have been unable to find a tool which does this well, and would be delighted if I could be pointed in the right direction.

Thanks!

Views

1.0K

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

How to convert PDF documents into .txt files?

New Here ,
Jan 18, 2019

Copy link to clipboard

Copied

Hi,

I am working on a research project in Machine Learning, where my dataset is a large collection of PDF files. I need to convert these PDF files into a format which I can use as input in a Text Classification model, such as .txt files. I have been unable to find a tool which does this well, and would be delighted if I could be pointed in the right direction.

Thanks!

Views

1.0K

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Most Valuable Participant ,
Jan 18, 2019

Copy link to clipboard

Copied

I doubt any tool will do it "well" unless you are very lucky, because PDFs don't always convert or even contain text. Do you have any paid-for Adobe services or products related to PDF?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
New Here ,
Jan 19, 2019

Copy link to clipboard

Copied

I do not, but I wanted some advice as to whether there were any paid-for services which work relatively well, and if there was an option to test them out before paying for them. The PDFs in question are all research reports, so all of them contain significant amount of text, and as you rightly pointed out, they have not been converting well using the free tools available online.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Adobe Community Professional ,
Jan 19, 2019

Copy link to clipboard

Copied

Hi,

You can get a free trial to Acrobat Pro to see if it works

Download Adobe Acrobat free trial | Acrobat Pro DC

And you can subscribe for a month for $25. Note that there is a lower price for annual, paid monthly.

Plans and pricing | Adobe Acrobat DC

Part of whether it works depends on how the PDF was made. Adobe created PDF, but gave it away and PDFs are now made by lots of vendors other than Adobe. Some are poorly made.

Using Acrobat and a well made PDF, you can also convert to Word and Excel.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
New Here ,
Jan 20, 2019

Copy link to clipboard

Copied

Thanks a lot, I'll try these out!

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Most Valuable Participant ,
Jan 19, 2019

Copy link to clipboard

Copied

The chances are many tools will do a similar job, and the limitation is in the PDF itself.

If the issue is word separators this might change.

If it is no text coming out, or gobbledegook, nothing will fix it.

Retyping might be a large and time consuming part of your project.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
New Here ,
Jan 20, 2019

Copy link to clipboard

Copied

Thank you for clearing that up!

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Most Valuable Participant ,
Jan 21, 2019

Copy link to clipboard

Copied

If you had Acrobat then you could export the pages as images, then create a new PDF file from those images and run Recognize Text on it. If the results are good you'll end up with a document that has readable text and that can be exported to a text file...

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
ccchuynh LATEST
New Here ,
Oct 06, 2019

Copy link to clipboard

Copied

I use the SODA PDF PREMIUM for this conversion. I am able to convert the pdf print file over 100 pages (does not work with pdf scan file).

Must use the PREMIUM versiopn do this conversion.

You can down load the try version and see if this is applicable to your files.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Resources
Trending Issue & Solution
Edit PDF in Adobe Acrobat Pro DC