Skip to main content
Participant
September 9, 2021
Answered

Is there a way to Export PDF to HTML using the export API?

  • September 9, 2021
  • 2 replies
  • 8719 views

I am working on converting the PDF into a HTML document. Using the Adobe export API I could convert my PDF into .docx or .png files but I couldn't find any documentation to convert into HTML format. I see that in the Adobe DC application I was able to convert the PDF to HTML and I was hoping this would be possible using the API. Is there any documentation or any way I could use to convert the PDF to HTML?

 

Thanks you!!

Correct answer Joel Geraci

Extract will product a JSON with an elements property. The elements is an array of objects that corresponds to each "chunk" of text in the PDF. There are two properties in each element that will be of interest when creating the HTML. The Path and the Text. The Path will end in one of the element types listed here. Most of them are also HTML element names that you'll be familiar with. The Text is... you guessed it... the text of that element.

 

Tables and Lists get interesting but those are still pretty easy to read. That said, you can extract the tables as CSV and then easily render those to HTML. There are number of code samples that will do that for you.

 

I'll be sharing some code that I have that does this soon.

2 replies

Participant
December 5, 2022

Is there any information known about whether the PDF to HTML conversion will be included in the API in the future? I need the output to be the same as I get when converting via adobe acrobat with layout, tables, font, etc.

Joel Geraci
Community Expert
Community Expert
September 9, 2021

The Extract API will output JSON which can easily be used to generate HTML on the fly.

Participant
September 9, 2021

Thanks for the reply. Actually that is a great approach, extracting the JSON and generating the HTML from it. I will take a look at the JSON output and try to use it to generate the HTML.

Could you suggest me any resources or any process to do this task. I was searching online and all I could find is converting the JSON into HTML tables.

Thanks.

Joel Geraci
Community Expert
Joel GeraciCommunity ExpertCorrect answer
Community Expert
September 9, 2021

Extract will product a JSON with an elements property. The elements is an array of objects that corresponds to each "chunk" of text in the PDF. There are two properties in each element that will be of interest when creating the HTML. The Path and the Text. The Path will end in one of the element types listed here. Most of them are also HTML element names that you'll be familiar with. The Text is... you guessed it... the text of that element.

 

Tables and Lists get interesting but those are still pretty easy to read. That said, you can extract the tables as CSV and then easily render those to HTML. There are number of code samples that will do that for you.

 

I'll be sharing some code that I have that does this soon.