Skip to main content
Participant
August 17, 2023
Question

Convert PDF to plain TXT

  • August 17, 2023
  • 1 reply
  • 2161 views

Hi,

With Adobe Acrobat Pro, I am able to convert a PDF into a plain TXT format.

 

I would like to do this conversion via an API. In the Adobe PDF Services documentation, there is only an option to convert to RTF, not TXT. 

 

I was wondering if there was a way to convert a PDF to plain TXT using an API service.

 

Thanks!

    This topic has been closed for replies.

    1 reply

    Joel Geraci
    Community Expert
    Community Expert
    August 17, 2023

    You can use the Extract API to get a JSON representation of the PDF then filter it to get only the text elements. From there you can output plain text. 

    peach11Author
    Participant
    August 17, 2023

    Thanks for your reply!

     

    When I convert the PDF to text via Acrobat Pro, it orders the text in a way that is useful to me -- the rows of the tables are formatted in paragraphs of text. I've attached a sample output PDF and TXT file. I don't think it would be possible to retain this format when using JSON. 

     

    This seems like a really roundabout way to complete a simple task. Why does the API not support plain text? It would be really useful to us.

    Joel Geraci
    Community Expert
    Community Expert
    August 17, 2023

    The usefulness of the JSON really depends on your goals. I find the output from Extract to be far more useful than plain text because I can easily format it into whatever I need. Also, tables are both represented as tables in the JSON similar to how HTML does and it can also output them as either .csv or .xslx.