Participating Frequently

Question

HTML to PDF using Adobe PDF services

Forum|Forum|4 years ago
July 21, 2021
3 replies
6648 views

Hello everyone,

I am working on writing a Python code for converting PDF to HTML and vise-versa.

I have two questions:

Is there any Adobe API enabling PDF to HTML conversion? if not, would you know any alternative? thank you.
I am trying to write a Python code that creates PDF from HTML. using the instructions in this link. I've seen that Adobe PDF services API requires the HTML entry file to be zipped. how does this work? should I save the HTML page with its complementary content and zip the all, or save just the HTML of the input web page? thank you.
I am trying to write a Python code that creates PDF from HTML. using the instructions in this link. But I strugle with how to precise/adapt the value of the JSON field within "cpf:inputs > params > cpf:inline" of form parameters, containing this value: <script src='./json.js' type='text/javascript'></script>

I noticed that it was said that:

In case of dynamic HTML this API allows you to capture the users unique data entries and then save it as PDF. Collected data is stored in a JSON file, and the source HTML file must include <script     src='./json.js' type='text/javascript'></script>

and it was said in the link always, concerning the description of the json field:

____________________________________________________________________________

claiming that :

json(string, optional)

JavaScript variables to be placed in global scope to reference while rendering the HTML. This mechanism is intended to be used to supply data that might otherwise be retrieved using ajax requests. The actual mechanics of accessing this content varies depending if rendering from a zip file or from a url. When rendering from a zip file, the source collateral must include a script element such as:
<script src='./json.js' type='text/javascript'></script>
When rendering from a URL, the content of this json object is injected into the browser VM before the page is rendered.

default: {}

Could you please help with this? (If you can precise what this json brings to the conversion operation, why using it is useful? and at which step it should be used?) thanks.

Here is my current Python code:

import datetime
import json
import jwt
import os
import requests

# informations to find in Adobe user account : credentials, Generated jwt

  # credentials
client_id= "ùù" # CLIENT ID (API key)
client_secret= "$"

  # Generated jwt
jwtPayloadRaw = """
                {"exp":,
                "iss":"@AdobeOrg",
                "sub":"@techacct.adobe.com",
                "https://ims-na1.adobelogin.com/s/ent_documentcloud_sdk":true,
                "aud":""}
                """

# set input file name
inputFileName = "blog_files"

# set output file name
outputFileName = "output"

url = "https://ims-na1.adobelogin.com/ims/exchange/jwt"

# convert jwt token into a Dictionary
jwtPayloadJson = json.loads(jwtPayloadRaw)
jwtPayloadJson["exp"] = datetime.datetime.utcnow() + datetime.timedelta(seconds=30) # Adobe requires adding a field in the json token with an expiration parameter

# getting the private Key
keyfile = open(os.getcwd()+"\config\private.key","r") # points to the private key
private_key = keyfile.read()

# Encoding the jwt token using the private key
jwttoken = jwt.encode(jwtPayloadJson, private_key, algorithm="RS256")

# Requesting server authorization
accessTokenRequestPayload = {"client_id":client_id, "client_secret": client_secret}
accessTokenRequestPayload["jwt_token"] = jwttoken

result = requests.post(url, data = accessTokenRequestPayload)

# getting Bearer token from the server
resultjson = json.loads(result.text)

import requests
import time
import json

URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%7B%22reltype%22%3A%20%22http%3A%2F%2Fns.adobe.com%2Frel%2Fprimary%22%7D"

# the bearer token written in this format : "Bearer generated_access_token"
Bearer_token = resultjson["token_type"]+" "+resultjson["access_token"]

# the headers
h = {
    "Authorization": Bearer_token,
    "Accept": "application/json, text/plain, */*",
    "x-api-key": client_id,
    "Prefer": "respond-async,wait=0"}

# the input file
myfile = {"InputFile":open(os.getcwd() + "\\" + inputFileName + ".zip", "rb")}

# open the JSON containing form parameters
with open("formatParams.json") as jsonFile:
    j = json.load(jsonFile)
    jsonFile.close()


body = {"contentAnalyzerRequests": json.dumps(j)}

resp = requests.post(url=URL, headers=h, data=body, files=myfile)

print("\nStatus of GET request: ",resp.status_code)
print(resp.text)
# print(resp.reason)

poll = True

while poll: # a loop constructed so as to write the pdf document only when its content is returned in the get response
    
    get_resp = requests.get(resp.headers["location"], headers=h)
    
    if get_resp.status_code == 200:  # the response contains output file content only if the status=200
        open(os.getcwd() +"\\"+outputFileName+".pdf", "wb").write(get_resp.content)
        poll = False
    else:
        time.sleep(5) # introduce a delay of 5s in the execution of the program if file content not yet ready

print("\nFinal Status of GET request: ",get_resp.status_code)
get_resp.content

Any help would be useful! thank you !

This topic has been closed for replies.

S

Simran Inamdar

Participating Frequently

Hello,

did you resolve your HTML zip problem?

I am able to run the sample project. I have created another zip archive of HTML file and other dependency files like CSS and js and pass it to the CreatePDFFromDynamicHTML.java file to create a PDF after executing

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.pdfservices.operation.samples.createpdf.CreatePDFFromDynamicHTML command it gives an error

description ='The zip file provided has invalid content.; transactionId=kNWylwnmtNDbJf9ERkbjuE1Ou62467uv'; requestTrackingId='ZDdG1zgw5sKjMdnEGjvV8PAdUlJwwwNO'; statusCode=400; errorCode=INVALID_ZIP

If yes then can you please guide me on how to resolve this?

L

Librarian SJW

Participating Frequently

Can you tell me how to take a html file and make it a pdf with proper pictures and things?

K

Kenneth Patterson

Community Manager

The important part to remember is: "Since HTML/web pages typically contain external assets, the input file must be a zip file containing an index.html at the top level of the archive as well as any dependencies such as images, css files, and so on."

Here is a link to the doc: https://opensource.adobe.com/pdftools-sdk-docs/release/latest/howtos.html#create-a-pdf-from-static-html

I suggest starting with the sample code. I don't know what programming language you are using, so here are the links to all of the sample code:

Java: https://github.com/adobe/pdfservices-java-sdk-samples
.NET: https://www.adobe.com/go/pdftoolsapi_net_samples
Node.js: http://www.adobe.com/go/pdftoolsapi_node_sample

K

Kenneth Patterson

Community Manager

First, I should point out that I am not a Python coder, so I will not be able to assist at the coding level for that language, but I will try to answer as much as I am able.

One overall suggestion: If you are going to use the public REST APIs rather than the SDKs, I highly recommend that you start building your API calls using the provided Postman collection. Postman is an excellent tool for building API calls in a testing environment.

In answer to your numbered questions:

The Adobe PDF Services API does not currently have functionality to convert PDF to HTML, but it can convert from HTML to PDF. To do PDF tto HTML please see here.
The zip file should contain "the input HTML file and its resources, along with the input data" (from the documentation). The JSON file contains the user inputs on the HTML page, which allows you to include that dynamic content in the PDF you are creating.
I suggest taking a look at the index.html file contained in the resources/createPDFFromDynamicHtmlInput.zip file found in the SDK sample code - here is the link to the Node.js samples, but the HTML should be the same: https://github.com/adobe/pdfservices-node-sdk-samples

A

A.5F87Author

Participating Frequently

Thank you!

In the part token from the documentation, claiming that "the input HTML file and its resources, along with the input data", could you please precise what is the "input data"? thank you.

I still have an imbeguity whith the definition of the json field (precised in the beginning of the initial question) saying that :

the source collateral must include a script element such as:
<script src='./json.js' type='text/javascript'></script>

where exactly have I to write this line of code? I don't get where.

I seek to understand this because the example given in the documentation is as follows:

{
  "cpf:engine": {
    "repo:assetId": "urn:aaid:cpf:Service-e2ee120a2b06427cb449592f5db967e7"
  },
  "cpf:inputs": {
    "params": {
      "cpf:inline": {
        "json": "[\"a\": \"b\"]",
        "print": {
          "includeHeaderFooter": true
        },
        "pageLayout": {
          "pageWidth": 11,
          "pageHeight": 8.5
        }
      }
    },
    "documentIn": {
      "dc:format": "application/zip",
      "cpf:location": "multipartLabel"
    }
  },
  "cpf:outputs": {
    "documentOut": {
      "dc:format": "application/pdf",
      "cpf:location": "multipartLabelOut"
    }
  }
}

with the json field as follows: "json": "[\"a\": \"b\"]"

I don't find it that clear to replace this field.

Thank you in advance!

A

A.5F87Author

Participating Frequently

Thank you!

I downloaded the zippzd foile you mentionned, and it worked well with my python code. I think the secret is the zip format of the input. When I do a simple download of an HTML page, my code doesn't create a proper PDF output.

here is an example of my downloaded version:

and within "index_files" I have this:

But when I use the format you suggested, my python code works brilliantly.

and the zipped file you suggested is different from the first one I download.

Is it the HTML code you mentionned in the last answer that helps making this difference? should I therefore download the html file with a program (using the HTML code you suggested) instead of downloading it manually ?

thanks you for your support!

I think I have first to master (static html file => pdf) conversion. so I think I don't need now to use this:

<script src='./json.js' type='text/javascript'></script>

But The problem I have is how to build the zip file. I saw that it was said : "refer the sdk documentation of create-pdf-operation.js to see instructions on the structure of the zip file". But can't find this file, I find instead only this one.

A

A.5F87Author

Participating Frequently

please add a like if the content is simply understandable and well structured.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded