Participating Frequently

Question

HTML to PDF using Adobe PDF services

Forum|Forum|4 years ago
July 21, 2021
3 replies
6637 views

Hello everyone,

I am working on writing a Python code for converting PDF to HTML and vise-versa.

I have two questions:

Is there any Adobe API enabling PDF to HTML conversion? if not, would you know any alternative? thank you.
I am trying to write a Python code that creates PDF from HTML. using the instructions in this link. I've seen that Adobe PDF services API requires the HTML entry file to be zipped. how does this work? should I save the HTML page with its complementary content and zip the all, or save just the HTML of the input web page? thank you.
I am trying to write a Python code that creates PDF from HTML. using the instructions in this link. But I strugle with how to precise/adapt the value of the JSON field within "cpf:inputs > params > cpf:inline" of form parameters, containing this value: <script src='./json.js' type='text/javascript'></script>

I noticed that it was said that:

In case of dynamic HTML this API allows you to capture the users unique data entries and then save it as PDF. Collected data is stored in a JSON file, and the source HTML file must include <script     src='./json.js' type='text/javascript'></script>

and it was said in the link always, concerning the description of the json field:

____________________________________________________________________________

claiming that :

json(string, optional)

JavaScript variables to be placed in global scope to reference while rendering the HTML. This mechanism is intended to be used to supply data that might otherwise be retrieved using ajax requests. The actual mechanics of accessing this content varies depending if rendering from a zip file or from a url. When rendering from a zip file, the source collateral must include a script element such as:
<script src='./json.js' type='text/javascript'></script>
When rendering from a URL, the content of this json object is injected into the browser VM before the page is rendered.

default: {}

Could you please help with this? (If you can precise what this json brings to the conversion operation, why using it is useful? and at which step it should be used?) thanks.

Here is my current Python code:

import datetime
import json
import jwt
import os
import requests

# informations to find in Adobe user account : credentials, Generated jwt

  # credentials
client_id= "ùù" # CLIENT ID (API key)
client_secret= "$"

  # Generated jwt
jwtPayloadRaw = """
                {"exp":,
                "iss":"@AdobeOrg",
                "sub":"@techacct.adobe.com",
                "https://ims-na1.adobelogin.com/s/ent_documentcloud_sdk":true,
                "aud":""}
                """

# set input file name
inputFileName = "blog_files"

# set output file name
outputFileName = "output"

url = "https://ims-na1.adobelogin.com/ims/exchange/jwt"

# convert jwt token into a Dictionary
jwtPayloadJson = json.loads(jwtPayloadRaw)
jwtPayloadJson["exp"] = datetime.datetime.utcnow() + datetime.timedelta(seconds=30) # Adobe requires adding a field in the json token with an expiration parameter

# getting the private Key
keyfile = open(os.getcwd()+"\config\private.key","r") # points to the private key
private_key = keyfile.read()

# Encoding the jwt token using the private key
jwttoken = jwt.encode(jwtPayloadJson, private_key, algorithm="RS256")

# Requesting server authorization
accessTokenRequestPayload = {"client_id":client_id, "client_secret": client_secret}
accessTokenRequestPayload["jwt_token"] = jwttoken

result = requests.post(url, data = accessTokenRequestPayload)

# getting Bearer token from the server
resultjson = json.loads(result.text)

import requests
import time
import json

URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%7B%22reltype%22%3A%20%22http%3A%2F%2Fns.adobe.com%2Frel%2Fprimary%22%7D"

# the bearer token written in this format : "Bearer generated_access_token"
Bearer_token = resultjson["token_type"]+" "+resultjson["access_token"]

# the headers
h = {
    "Authorization": Bearer_token,
    "Accept": "application/json, text/plain, */*",
    "x-api-key": client_id,
    "Prefer": "respond-async,wait=0"}

# the input file
myfile = {"InputFile":open(os.getcwd() + "\\" + inputFileName + ".zip", "rb")}

# open the JSON containing form parameters
with open("formatParams.json") as jsonFile:
    j = json.load(jsonFile)
    jsonFile.close()


body = {"contentAnalyzerRequests": json.dumps(j)}

resp = requests.post(url=URL, headers=h, data=body, files=myfile)

print("\nStatus of GET request: ",resp.status_code)
print(resp.text)
# print(resp.reason)

poll = True

while poll: # a loop constructed so as to write the pdf document only when its content is returned in the get response
    
    get_resp = requests.get(resp.headers["location"], headers=h)
    
    if get_resp.status_code == 200:  # the response contains output file content only if the status=200
        open(os.getcwd() +"\\"+outputFileName+".pdf", "wb").write(get_resp.content)
        poll = False
    else:
        time.sleep(5) # introduce a delay of 5s in the execution of the program if file content not yet ready

print("\nFinal Status of GET request: ",get_resp.status_code)
get_resp.content

Any help would be useful! thank you !

This topic has been closed for replies.

Simran Inamdar

Participating Frequently

Hello,

did you resolve your HTML zip problem?

I am able to run the sample project. I have created another zip archive of HTML file and other dependency files like CSS and js and pass it to the CreatePDFFromDynamicHTML.java file to create a PDF after executing

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.pdfservices.operation.samples.createpdf.CreatePDFFromDynamicHTML command it gives an error

description ='The zip file provided has invalid content.; transactionId=kNWylwnmtNDbJf9ERkbjuE1Ou62467uv'; requestTrackingId='ZDdG1zgw5sKjMdnEGjvV8PAdUlJwwwNO'; statusCode=400; errorCode=INVALID_ZIP

If yes then can you please guide me on how to resolve this?

Librarian SJW

Participating Frequently

Can you tell me how to take a html file and make it a pdf with proper pictures and things?

Kenneth Patterson

Community Manager

The important part to remember is: "Since HTML/web pages typically contain external assets, the input file must be a zip file containing an index.html at the top level of the archive as well as any dependencies such as images, css files, and so on."

Here is a link to the doc: https://opensource.adobe.com/pdftools-sdk-docs/release/latest/howtos.html#create-a-pdf-from-static-html

I suggest starting with the sample code. I don't know what programming language you are using, so here are the links to all of the sample code:

Java: https://github.com/adobe/pdfservices-java-sdk-samples
.NET: https://www.adobe.com/go/pdftoolsapi_net_samples
Node.js: http://www.adobe.com/go/pdftoolsapi_node_sample

Kenneth Patterson

Community Manager

First, I should point out that I am not a Python coder, so I will not be able to assist at the coding level for that language, but I will try to answer as much as I am able.

One overall suggestion: If you are going to use the public REST APIs rather than the SDKs, I highly recommend that you start building your API calls using the provided Postman collection. Postman is an excellent tool for building API calls in a testing environment.

In answer to your numbered questions:

The Adobe PDF Services API does not currently have functionality to convert PDF to HTML, but it can convert from HTML to PDF. To do PDF tto HTML please see here.
The zip file should contain "the input HTML file and its resources, along with the input data" (from the documentation). The JSON file contains the user inputs on the HTML page, which allows you to include that dynamic content in the PDF you are creating.
I suggest taking a look at the index.html file contained in the resources/createPDFFromDynamicHtmlInput.zip file found in the SDK sample code - here is the link to the Node.js samples, but the HTML should be the same: https://github.com/adobe/pdfservices-node-sdk-samples

A.5F87Author

Participating Frequently

Thank you!

In the part token from the documentation, claiming that "the input HTML file and its resources, along with the input data", could you please precise what is the "input data"? thank you.

I still have an imbeguity whith the definition of the json field (precised in the beginning of the initial question) saying that :

the source collateral must include a script element such as:
<script src='./json.js' type='text/javascript'></script>

where exactly have I to write this line of code? I don't get where.

I seek to understand this because the example given in the documentation is as follows:

{
  "cpf:engine": {
    "repo:assetId": "urn:aaid:cpf:Service-e2ee120a2b06427cb449592f5db967e7"
  },
  "cpf:inputs": {
    "params": {
      "cpf:inline": {
        "json": "[\"a\": \"b\"]",
        "print": {
          "includeHeaderFooter": true
        },
        "pageLayout": {
          "pageWidth": 11,
          "pageHeight": 8.5
        }
      }
    },
    "documentIn": {
      "dc:format": "application/zip",
      "cpf:location": "multipartLabel"
    }
  },
  "cpf:outputs": {
    "documentOut": {
      "dc:format": "application/pdf",
      "cpf:location": "multipartLabelOut"
    }
  }
}

with the json field as follows: "json": "[\"a\": \"b\"]"

I don't find it that clear to replace this field.

Thank you in advance!

Kenneth Patterson

Community Manager

Below you will find the source code for the HTML file that I suggeted you look at (#3 question above). In it you will see how the <script> file you are having difficulty with is being used. It begins right after the <body> tag in the HTML. Please download the sample code referenced in the answer I gave to question #3 and have a look.

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="UTF-8">
  <title>Adobe IO</title>
  <link rel="stylesheet" href="css/spectrum/spectrum-core.css">
  <link rel="stylesheet" href="css/spectrum/spectrum-dark.css">
  <link rel="stylesheet" href="css/spectrum/spectrum-darkest.css">
  <link rel="stylesheet" href="css/spectrum/spectrum-light.css">
  <link rel="stylesheet" href="css/spectrum/spectrum-lightest.css">

  <link rel="stylesheet" href="css/navbar.min.css">

  <link rel="stylesheet" href="css/styles.css">
    <style>
@media print {
    .pagebreak { page-break-before: always; } 
}
</style>
</head>

<body class="spectrum spectrum--medium spectrum--light" onload = "initial()">
  <script src="./json.js"></script>
  <script type="text/javascript">
      function initial()
      {
          var document = window.document;
          document.getElementById("title").innerHTML = String(window.json.title);
          document.getElementById("sub-title").innerHTML = String(window.json.sub_title);
      }
    </script>
  <header>
    <nav class="header">
      <div class="spectrum">
        <a href="#">
          <p class="spectrum-Heading3">Adobe I/O</p>
        </a>
      </div>
      <ul class="nav" data-function="navbar">
        <li class="spectrum-Tabs-itemLabel"><a href="#">APIs </a></li>
        <li class="spectrum-Tabs-itemLabel"><a href="#">Authentication </a></li>
        <li class="spectrum-Tabs-itemLabel"><a href="#">Open Source </a></li>
        <li class="spectrum-Tabs-itemLabel"><a href="#">Blog </a></li>
        <li class="spectrum-Tabs-itemLabel">
          <button class="spectrum-Button spectrum-Button--cta navbar-cta-btn">
            <span class="spectrum-Button-label">Console</span>
          </button>
        </li>
      </ul>
    </nav>
  </header>
  <nav class="sub-header">
    <ul class="nav" data-function="navbar">
      <li class="spectrum-Tabs-itemLabel">
        <a href="#" class="with-icon">
         <img src="css/img/product-icon-dc.svg" class="spectrum-Icon spectrum-Icon--sizeS" alt="" style="margin-right: 10px">
          <span>ADOBE DOCUMENT CLOUD SDK</span>
          </a>
      </li>
    </ul>
  </nav>

  <!--Hero Image-->
  <div class="spectrum-IllustratedMessage bannerImage">
    <div class="spectrum">
      <h2 class="spectrum-Heading1" id="title"></h2>
      <p style="color: #d3d3d3; max-width: 600px; margin-top: 4px;" id="sub-title"></p>
      <a class="spectrum-Button spectrum-Button--overBackground spectrum--medium spectrum--darkest" href="form.html">
        <span class="spectrum-Button-label">Request Access</span>
      </a>
    </div>
  </div>

  <!-- End of Hero Image-->

  <!--Card Section-->
  <div class="spectrum-Article" style="padding: 40px 70px 0px 80px;max-width: 1000px; display: flex">
  </div>

  <section class="cards-container">

    <div class="spectrum-Card" tabindex="0" role="figure">
      <div class="spectrum-Card-coverPhoto">

        <div class="spectrum-Card-coverPhoto" style="background-image: url(css/img/illustrations/11.svg);background-size: contain; background-repeat: no-repeat;height: calc(100% - 20px);
    width: calc(100% - 20px);border-bottom-color: transparent; margin: auto; ">

        </div>
      </div>
      <div class="spectrum-Card-body">
        <div class="spectrum-Card-header">
          <div class="spectrum-Card-title">
            <div class="spectrum">
              <!--  <p class="spectrum-Heading4">Article Heading4 <em>Emphasis</em> <strong>Strong</strong>.</p>-->
              <p class="spectrum-Heading3">Embed Adobe gold standard PDF viewer</p>

            </div>
          </div>
          <div class="spectrum-Card-actionButton">
            <button aria-haspopup="true" class="spectrum-ActionButton spectrum-ActionButton--quiet">
              <svg class="spectrum-Icon spectrum-Icon--sizeS" focusable="false" aria-hidden="true">
                <use xlink:href="#spectrum-icon-18-More" />
              </svg>
            </button>
          </div>
        </div>
        <div class="spectrum-Card-content">
          <!--  <p class="spectrum-Body3">Body3 Text <em>Body3 Emphasis</em> <strong>Body3 Strong</strong>.</p>-->
          <p class="spectrum-Body4"> With only a few lines of code, developers can "wow" their customers with embedded PDF features as well as customize the online end-user experience. <br/> <a href="viewSDK.html" class="spectrum-Link">Learn More</a></p>
          <!--</div>-->
        </div>
      </div>
      <!--
    <div class="spectrum-Card-footer">
      <a href="viewSDK.html"  class="spectrum-Button spectrum-Button--primary">
        <span class="spectrum-Button-label">Read More</span>
     </a>
    </div>
-->
    </div>

    <div class="spectrum-Card" tabindex="0" role="figure">
      <div class="spectrum-Card-coverPhoto">

        <div class="spectrum-Card-coverPhoto" style="background-image: url(css/img/illustrations/12.svg);background-size: contain; background-repeat: no-repeat;height: calc(100% - 20px);
    width: calc(100% - 20px);border-bottom-color: transparent; margin: auto; "></div>
      </div>
      <div class="spectrum-Card-body">
        <div class="spectrum-Card-header">
          <div class="spectrum-Card-title">
            <div class="spectrum">
              <!--  <p class="spectrum-Heading4">Article Heading4 <em>Emphasis</em> <strong>Strong</strong>.</p>-->
              <p class="spectrum-Heading3">Easily automate PDF conversion </p>
            </div>
          </div>
          <div class="spectrum-Card-actionButton">
            <button aria-haspopup="true" class="spectrum-ActionButton spectrum-ActionButton--quiet">
              <svg class="spectrum-Icon spectrum-Icon--sizeS" focusable="false" aria-hidden="true">
                <use xlink:href="#spectrum-icon-18-More" />
              </svg>
            </button>
          </div>
        </div>
        <div class="spectrum-Card-content">
          <!--       <div class="spectrum">-->
          <!--  <p class="spectrum-Body3">Body3 Text <em>Body3 Emphasis</em> <strong>Body3 Strong</strong>.</p>-->
          <p class="spectrum-Body4">Turn Microsoft documents, images and HTML pages into PDFs. Convert PDFs to Microsoft Word, Excel or PowerPoint, and various image formats (JPG, TIFF or PNG) with high fidelity. </p>
          <!--</div>-->
        </div>
      </div>
      <!--
    <div class="spectrum-Card-footer">
     <a href="viewSDK.html"  class="spectrum-Button spectrum-Button--primary">
        <span class="spectrum-Button-label">Read More</span>
     </a>
    </div>
-->
    </div>

    <div class="spectrum-Card" tabindex="0" role="figure">
      <div class="spectrum-Card-coverPhoto">
        <div class="spectrum-Card-coverPhoto" style=" height: calc(100% - 20px);
    width: calc(100% - 20px); background-image: url(css/img/illustrations/13.svg); background-repeat: no-repeat; background-size: contain; border-bottom-color: transparent; margin: auto;"></div>
      </div>
      <div class="spectrum-Card-body">
        <div class="spectrum-Card-header">
          <div class="spectrum-Card-title">
            <div class="spectrum">
              <!--  <p class="spectrum-Heading4">Article Heading4 <em>Emphasis</em> <strong>Strong</strong>.</p>-->
              <p class="spectrum-Heading3">Gather analytics </p>
            </div>
          </div>
          <div class="spectrum-Card-actionButton">
            <button aria-haspopup="true" class="spectrum-ActionButton spectrum-ActionButton--quiet">
              <svg class="spectrum-Icon spectrum-Icon--sizeS" focusable="false" aria-hidden="true">
                <use xlink:href="#spectrum-icon-18-More" />
              </svg>
            </button>
          </div>
        </div>
        <div class="spectrum-Card-content">
          <p class="spectrum-Body4">Get out of the box integration with Adobe Analytics. Get insights into how your PDFs are viewed and consumed. </p>
        </div>
      </div>
      <!--
    <div class="spectrum-Card-footer">
     <a href="viewSDK.html"  class="spectrum-Button spectrum-Button--primary">
        <span class="spectrum-Button-label">Read More</span>
     </a>
    </div>
-->
    </div>




  </section>
  <!--End of card section-->

  <!--
<div class="spectrum-IllustratedMessage">
  <svg class="spectrum-IllustratedMessage-illustration" xmlns="http://www.w3.org/2000/svg" width="200" height="98" viewBox="0 0 199 97.7"><defs><style>.cls-1,.cls-2{fill:none;stroke-linecap:round;stroke-linejoin:round;}.cls-1{stroke-width:3px;}.cls-2{stroke-width:2px;}</style></defs><title>Asset 1</title><g id="Layer_2" data-name="Layer 2"><g id="illustrations"><path class="cls-1" d="M110.53,85.66,100.26,95.89a1.09,1.09,0,0,1-1.52,0L88.47,85.66"/><line class="cls-1" x1="99.5" y1="95.5" x2="99.5" y2="58.5"/><path class="cls-1" d="M105.5,73.5h19a2,2,0,0,0,2-2v-43"/><path class="cls-1" d="M126.5,22.5h-19a2,2,0,0,1-2-2V1.5h-31a2,2,0,0,0-2,2v68a2,2,0,0,0,2,2h19"/><line class="cls-1" x1="105.5" y1="1.5" x2="126.5" y2="22.5"/><path class="cls-2" d="M47.93,50.49a5,5,0,1,0-4.83-5A4.93,4.93,0,0,0,47.93,50.49Z"/><path class="cls-2" d="M36.6,65.93,42.05,60A2.06,2.06,0,0,1,45,60l12.68,13.2"/><path class="cls-2" d="M3.14,73.23,22.42,53.76a1.65,1.65,0,0,1,2.38,0l19.05,19.7"/><path class="cls-1" d="M139.5,36.5H196A1.49,1.49,0,0,1,197.5,38V72A1.49,1.49,0,0,1,196,73.5H141A1.49,1.49,0,0,1,139.5,72V32A1.49,1.49,0,0,1,141,30.5H154a2.43,2.43,0,0,1,1.67.66l6,5.66"/><rect class="cls-1" x="1.5" y="34.5" width="58" height="39" rx="2" ry="2"/></g></g></svg>
  <h2 class="spectrum-Heading spectrum-Heading--pageTitle spectrum-IllustratedMessage-heading">Error 404: Page Not Found</h2>
  <p class="spectrum-Body--secondary spectrum-IllustratedMessage-description">This page isn't available. Try checking the URL or visit a different page.</p>
</div>
-->

<div class="pagebreak"> </div>
    
  <!--Another Banner -->
  <div class="post post-container contentLeft darkTheme article JamieWelcome" style="background-color:#2C2C2C">
    <div class="post-image-container">
      <div class="welcome-images">
        <div class="post-background-image">
          <img src="css/img/swoosh-big.png" data-aos="custom-slide-right" data-aos-duration="700" data-aos-once="true" class="aos-init aos-animate">
        </div>
      </div>
    </div>
    <div class="post-content-container">
      <div class="content">

        <h2 class="aos-init aos-animate"> Document Cloud SDK</h2>
        <p>Document Cloud SDK enables developers to create compelling electronic documents experiences, including viewing, creating and exporting PDFs from popular file formats.  Using the Document Cloud SDK, developers can easily automate the generation, manipulation, and transformation of content via a set of modern cloud-based APIs. </p>
        <div style="background-color: #2C2C2C; color: #2C2C2C;">
          <a class="spectrum-Button spectrum-Button--overBackground" href="form.html">
            <span class="spectrum-Button-label">Request Access</span>
          </a>
        </div>
      </div>
    </div>
  </div>
  <!-- Another Banner -->


  <!--Add footer-->


  <div class="slide-jobs" style="background: #222">
    <a name="JobOpenings"></a>
    <div class="content">
      <div class="jobs-scroll-container">
        <div class="jobs-container">
          <h3 id="JobOpenings_ExperienceDesign">From the Adobe I/O Blog</h3>
          <ul class="jobs-list" style="background: #222">
            <li>
              <a href="jobs/jobpost-8897955/">
                <p class="spectrum-Heading5">APIs</p>
              </a>
              <div class="job-location spectrum-Body5" style="margin-top: 10px;">
                Document Cloud
              </div>
              <div class="job-location spectrum-Body5">
                Creative Cloud
              </div>
              <div class="job-location spectrum-Body5">
                Experience Cloud
              </div>
              <div class="job-location spectrum-Body5">
                Adobe Experience Platform
              </div>
            </li>
            <li>
              <a href="jobs/jobpost-8897955/">
                <p class="spectrum-Heading5">Blogs and Community </p>
              </a>
              <div class="job-location  spectrum-Body5" style="margin-top: 10px;">
                Adobe Tech Blog
              </div>
              <div class="job-location  spectrum-Body5">
                Adobe on GitHub
              </div>
            </li>
            <li>
              <a href="jobs/jobpost-8897955/">
                <p class="spectrum-Heading5">&nbsp;</p>
              </a>

              <div class="job-location  spectrum-Body5" style="margin-top: 10px;">
                Adobe I/O on Twitter
              </div>
              <div class="job-location  spectrum-Body5">

                Adobe I/O on YouTube
              </div>
            </li>

            <li>
              <a href="jobs/jobpost-8897955/">
                <p class="spectrum-Heading5">Support</p>
              </a>
              <div class="job-location  spectrum-Body5" style="margin-top: 10px;">
                Contact Us
              </div>
              <div class="job-location  spectrum-Body5">
                Adobe on StackOverflow
              </div>
              <div class="job-location  spectrum-Body5">
                Adobe Product Support
              </div>
              <div class="job-location  spectrum-Body5">

                Release Notes
              </div>
              <div class="job-location  spectrum-Body5">

                Forums
              </div>
            </li>
            <li>
              <a href="jobs/jobpost-8897955/">
                <p class="spectrum-Heading5">Adobe</p>
              </a>
              <div class="job-location  spectrum-Body5" style="margin-top: 10px;">
                Open Source Adobe
              </div>
              <div class="job-location  spectrum-Body5">
                Privacy Policy
              </div>
              <div class="job-location  spectrum-Body5">
                Terms of Use
              </div>
              <div class="job-location  spectrum-Body5">

                Cookies
              </div>
            </li>
          </ul>
        </div>
      </div>

    </div>
    <!--            <img class="jobs-top-graphic" src="/images/footer-graphic-top.png">-->

  </div>

  <script src="js/navbar.min.js"></script>
  <script>
    var myMenu = new Navbar('.nav');
  </script>
</body>

</html>

A.5F87Author

Participating Frequently

please add a like if the content is simply understandable and well structured.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded