Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
1

Convert PDF to plain TXT

New Here ,
Aug 17, 2023 Aug 17, 2023

Copy link to clipboard

Copied

Hi,

With Adobe Acrobat Pro, I am able to convert a PDF into a plain TXT format.

 

I would like to do this conversion via an API. In the Adobe PDF Services documentation, there is only an option to convert to RTF, not TXT. 

 

I was wondering if there was a way to convert a PDF to plain TXT using an API service.

 

Thanks!

Views

1.1K
Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 17, 2023 Aug 17, 2023

Copy link to clipboard

Copied

You can use the Extract API to get a JSON representation of the PDF then filter it to get only the text elements. From there you can output plain text. 

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 17, 2023 Aug 17, 2023

Copy link to clipboard

Copied

Thanks for your reply!

 

When I convert the PDF to text via Acrobat Pro, it orders the text in a way that is useful to me -- the rows of the tables are formatted in paragraphs of text. I've attached a sample output PDF and TXT file. I don't think it would be possible to retain this format when using JSON. 

 

This seems like a really roundabout way to complete a simple task. Why does the API not support plain text? It would be really useful to us.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 17, 2023 Aug 17, 2023

Copy link to clipboard

Copied

The usefulness of the JSON really depends on your goals. I find the output from Extract to be far more useful than plain text because I can easily format it into whatever I need. Also, tables are both represented as tables in the JSON similar to how HTML does and it can also output them as either .csv or .xslx. 

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 14, 2024 Aug 14, 2024

Copy link to clipboard

Copied

I have a similar problem:
I only want to do the same from my C#-Application like it is doing Acrobat with the function "Save as...".

Only load the pdf and save it as .txt.

How can I do this?

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Aug 14, 2024 Aug 14, 2024

Copy link to clipboard

Copied

Joel already answered. Extract gives you a JSON representation of the PDF. You can work with the results from that to generate a txt version of the PDF. It can get complex, for example, rendering tables, but it's possible. 

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 15, 2024 Aug 15, 2024

Copy link to clipboard

Copied

LATEST

Can you give me a link to an example (best in C#) how to  do this?

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources