Convert PDF to plain TXT

Report · Aug 17, 2023

Hi,

With Adobe Acrobat Pro, I am able to convert a PDF into a plain TXT format.

I would like to do this conversion via an API. In the Adobe PDF Services documentation, there is only an option to convert to RTF, not TXT.

I was wondering if there was a way to convert a PDF to plain TXT using an API service.

Thanks!

Report · Aug 17, 2023

You can use the Extract API to get a JSON representation of the PDF then filter it to get only the text elements. From there you can output plain text.

Report · Aug 17, 2023

Thanks for your reply!

When I convert the PDF to text via Acrobat Pro, it orders the text in a way that is useful to me -- the rows of the tables are formatted in paragraphs of text. I've attached a sample output PDF and TXT file. I don't think it would be possible to retain this format when using JSON.

This seems like a really roundabout way to complete a simple task. Why does the API not support plain text? It would be really useful to us.

Report · Aug 17, 2023

The usefulness of the JSON really depends on your goals. I find the output from Extract to be far more useful than plain text because I can easily format it into whatever I need. Also, tables are both represented as tables in the JSON similar to how HTML does and it can also output them as either .csv or .xslx.

Report · Aug 14, 2024

I have a similar problem:
I only want to do the same from my C#-Application like it is doing Acrobat with the function "Save as...".

Only load the pdf and save it as .txt.

How can I do this?

Report · Aug 14, 2024

Joel already answered. Extract gives you a JSON representation of the PDF. You can work with the results from that to generate a txt version of the PDF. It can get complex, for example, rendering tables, but it's possible.

Report · Aug 15, 2024

Can you give me a link to an example (best in C#) how to do this?

Convert PDF to plain TXT

Photos