• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
1

Is there a way to Export PDF to HTML using the export API?

Community Beginner ,
Sep 09, 2021 Sep 09, 2021

Copy link to clipboard

Copied

I am working on converting the PDF into a HTML document. Using the Adobe export API I could convert my PDF into .docx or .png files but I couldn't find any documentation to convert into HTML format. I see that in the Adobe DC application I was able to convert the PDF to HTML and I was hoping this would be possible using the API. Is there any documentation or any way I could use to convert the PDF to HTML?

 

Thanks you!!

TOPICS
PDF Extract API , PDF Services API

Views

4.6K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Sep 09, 2021 Sep 09, 2021

Extract will product a JSON with an elements property. The elements is an array of objects that corresponds to each "chunk" of text in the PDF. There are two properties in each element that will be of interest when creating the HTML. The Path and the Text. The Path will end in one of the element types listed here. Most of them are also HTML element names that you'll be familiar with. The Text is... you guessed it... the text of that element.

 

Tables and Lists get interesting but those are still

...

Votes

Translate

Translate
Community Expert ,
Sep 09, 2021 Sep 09, 2021

Copy link to clipboard

Copied

The Extract API will output JSON which can easily be used to generate HTML on the fly.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Sep 09, 2021 Sep 09, 2021

Copy link to clipboard

Copied

Thanks for the reply. Actually that is a great approach, extracting the JSON and generating the HTML from it. I will take a look at the JSON output and try to use it to generate the HTML.

Could you suggest me any resources or any process to do this task. I was searching online and all I could find is converting the JSON into HTML tables.

Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 09, 2021 Sep 09, 2021

Copy link to clipboard

Copied

Extract will product a JSON with an elements property. The elements is an array of objects that corresponds to each "chunk" of text in the PDF. There are two properties in each element that will be of interest when creating the HTML. The Path and the Text. The Path will end in one of the element types listed here. Most of them are also HTML element names that you'll be familiar with. The Text is... you guessed it... the text of that element.

 

Tables and Lists get interesting but those are still pretty easy to read. That said, you can extract the tables as CSV and then easily render those to HTML. There are number of code samples that will do that for you.

 

I'll be sharing some code that I have that does this soon.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Sep 15, 2021 Sep 15, 2021

Copy link to clipboard

Copied

I used the normal text extract(without any styling) output from the Extract API, and was able to convert into a HTML using the properties. In the JSON output(text extract without any styling information) the list elements are considered as heading tags so didn't have to worry about them but tables took some time to figure out. I couldn't understand how to use the font data and also the background color information because of their  format. That being said, so far my HTML looks far more better than any other software or packages that I used.

 

Could you share the code you were saying so that I could have even better understanding and also could improve my approach. Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 25, 2021 Sep 25, 2021

Copy link to clipboard

Copied

@Sreevatsava5FEF 

 

Here's a link that has working samples. Ofcourse @Joel_Geraci might have some better samples. 

 

Also, did you manage to accomplish the conversion? Are converting in to HTML for web viewing or for emailing?

 

Thanks!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

Curious to see the business logic on how you have rendered the HTML from JSON output. Can you point me to the code samples, please? Thanks a ton

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Feb 17, 2022 Feb 17, 2022

Copy link to clipboard

Copied

Did you find any framework or API which will create HTML from Json output?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 30, 2022 Oct 30, 2022

Copy link to clipboard

Copied

Hi @Joel_Geraci ,

Can you please share sample code of PDF extracted JSON to HTML page?

Thank you in advance.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 26, 2023 Jun 26, 2023

Copy link to clipboard

Copied

Hi Yuvraj Kale,

Are you create the html page from PDF extracted JSON?

 

Thanks you

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 27, 2023 Jun 27, 2023

Copy link to clipboard

Copied

LATEST

Hey,
I am still waiting for answer on this. We want to convert PDF to HTML, but unable to find correct library which can do this. 

It would be great if you can share any reference regarding this.

Thanks,

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 05, 2022 Dec 05, 2022

Copy link to clipboard

Copied

Is there any information known about whether the PDF to HTML conversion will be included in the API in the future? I need the output to be the same as I get when converting via adobe acrobat with layout, tables, font, etc.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources