Copy link to clipboard
Copied
Hello everyone,
I am working on writing a Python code for converting PDF to HTML and vise-versa.
I have two questions:
I noticed that it was said that:
In case of dynamic HTML this API allows you to capture the users unique data entries and then save it as PDF. Collected data is stored in a JSON file, and the source HTML file must include <script src='./json.js' type='text/javascript'></script>
and it was said in the link always, concerning the description of the json field:
____________________________________________________________________________
claiming that :
json(string, optional)
JavaScript variables to be placed in global scope to reference while rendering the HTML. This mechanism is intended to be used to supply data that might otherwise be retrieved using ajax requests. The actual mechanics of accessing this content varies depending if rendering from a zip file or from a url. When rendering from a zip file, the source collateral must include a script element such as:
<script src='./json.js' type='text/javascript'></script>
When rendering from a URL, the content of this json object is injected into the browser VM before the page is rendered.
default: {}
Could you please help with this? (If you can precise what this json brings to the conversion operation, why using it is useful? and at which step it should be used?) thanks.
Here is my current Python code:
import datetime
import json
import jwt
import os
import requests
# informations to find in Adobe user account : credentials, Generated jwt
# credentials
client_id= "ùù" # CLIENT ID (API key)
client_secret= "$"
# Generated jwt
jwtPayloadRaw = """
{"exp":,
"iss":"@AdobeOrg",
"sub":"@techacct.adobe.com",
"https://ims-na1.adobelogin.com/s/ent_documentcloud_sdk":true,
"aud":""}
"""
# set input file name
inputFileName = "blog_files"
# set output file name
outputFileName = "output"
url = "https://ims-na1.adobelogin.com/ims/exchange/jwt"
# convert jwt token into a Dictionary
jwtPayloadJson = json.loads(jwtPayloadRaw)
jwtPayloadJson["exp"] = datetime.datetime.utcnow() + datetime.timedelta(seconds=30) # Adobe requires adding a field in the json token with an expiration parameter
# getting the private Key
keyfile = open(os.getcwd()+"\config\private.key","r") # points to the private key
private_key = keyfile.read()
# Encoding the jwt token using the private key
jwttoken = jwt.encode(jwtPayloadJson, private_key, algorithm="RS256")
# Requesting server authorization
accessTokenRequestPayload = {"client_id":client_id, "client_secret": client_secret}
accessTokenRequestPayload["jwt_token"] = jwttoken
result = requests.post(url, data = accessTokenRequestPayload)
# getting Bearer token from the server
resultjson = json.loads(result.text)
import requests
import time
import json
URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%7B%22reltype%22%3A%20%22http%3A%2F%2Fns.adobe.com%2Frel%2Fprimary%22%7D"
# the bearer token written in this format : "Bearer generated_access_token"
Bearer_token = resultjson["token_type"]+" "+resultjson["access_token"]
# the headers
h = {
"Authorization": Bearer_token,
"Accept": "application/json, text/plain, */*",
"x-api-key": client_id,
"Prefer": "respond-async,wait=0"}
# the input file
myfile = {"InputFile":open(os.getcwd() + "\\" + inputFileName + ".zip", "rb")}
# open the JSON containing form parameters
with open("formatParams.json") as jsonFile:
j = json.load(jsonFile)
jsonFile.close()
body = {"contentAnalyzerRequests": json.dumps(j)}
resp = requests.post(url=URL, headers=h, data=body, files=myfile)
print("\nStatus of GET request: ",resp.status_code)
print(resp.text)
# print(resp.reason)
poll = True
while poll: # a loop constructed so as to write the pdf document only when its content is returned in the get response
get_resp = requests.get(resp.headers["location"], headers=h)
if get_resp.status_code == 200: # the response contains output file content only if the status=200
open(os.getcwd() +"\\"+outputFileName+".pdf", "wb").write(get_resp.content)
poll = False
else:
time.sleep(5) # introduce a delay of 5s in the execution of the program if file content not yet ready
print("\nFinal Status of GET request: ",get_resp.status_code)
get_resp.content
Any help would be useful! thank you !
Copy link to clipboard
Copied
please add a like if the content is simply understandable and well structured.
Copy link to clipboard
Copied
First, I should point out that I am not a Python coder, so I will not be able to assist at the coding level for that language, but I will try to answer as much as I am able.
One overall suggestion: If you are going to use the public REST APIs rather than the SDKs, I highly recommend that you start building your API calls using the provided Postman collection. Postman is an excellent tool for building API calls in a testing environment.
In answer to your numbered questions:
Copy link to clipboard
Copied
Thank you!
In the part token from the documentation, claiming that "the input HTML file and its resources, along with the input data", could you please precise what is the "input data"? thank you.
I still have an imbeguity whith the definition of the json field (precised in the beginning of the initial question) saying that :
the source collateral must include a script element such as:
<script src='./json.js' type='text/javascript'></script>
where exactly have I to write this line of code? I don't get where.
I seek to understand this because the example given in the documentation is as follows:
{
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:Service-e2ee120a2b06427cb449592f5db967e7"
},
"cpf:inputs": {
"params": {
"cpf:inline": {
"json": "[\"a\": \"b\"]",
"print": {
"includeHeaderFooter": true
},
"pageLayout": {
"pageWidth": 11,
"pageHeight": 8.5
}
}
},
"documentIn": {
"dc:format": "application/zip",
"cpf:location": "multipartLabel"
}
},
"cpf:outputs": {
"documentOut": {
"dc:format": "application/pdf",
"cpf:location": "multipartLabelOut"
}
}
}
with the json field as follows: "json": "[\"a\": \"b\"]"
I don't find it that clear to replace this field.
Thank you in advance!
Copy link to clipboard
Copied
Below you will find the source code for the HTML file that I suggeted you look at (#3 question above). In it you will see how the <script> file you are having difficulty with is being used. It begins right after the <body> tag in the HTML. Please download the sample code referenced in the answer I gave to question #3 and have a look.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Adobe IO</title>
<link rel="stylesheet" href="css/spectrum/spectrum-core.css">
<link rel="stylesheet" href="css/spectrum/spectrum-dark.css">
<link rel="stylesheet" href="css/spectrum/spectrum-darkest.css">
<link rel="stylesheet" href="css/spectrum/spectrum-light.css">
<link rel="stylesheet" href="css/spectrum/spectrum-lightest.css">
<link rel="stylesheet" href="css/navbar.min.css">
<link rel="stylesheet" href="css/styles.css">
<style>
@media print {
.pagebreak { page-break-before: always; }
}
</style>
</head>
<body class="spectrum spectrum--medium spectrum--light" onload = "initial()">
<script src="./json.js"></script>
<script type="text/javascript">
function initial()
{
var document = window.document;
document.getElementById("title").innerHTML = String(window.json.title);
document.getElementById("sub-title").innerHTML = String(window.json.sub_title);
}
</script>
<header>
<nav class="header">
<div class="spectrum">
<a href="#">
<p class="spectrum-Heading3">Adobe I/O</p>
</a>
</div>
<ul class="nav" data-function="navbar">
<li class="spectrum-Tabs-itemLabel"><a href="#">APIs </a></li>
<li class="spectrum-Tabs-itemLabel"><a href="#">Authentication </a></li>
<li class="spectrum-Tabs-itemLabel"><a href="#">Open Source </a></li>
<li class="spectrum-Tabs-itemLabel"><a href="#">Blog </a></li>
<li class="spectrum-Tabs-itemLabel">
<button class="spectrum-Button spectrum-Button--cta navbar-cta-btn">
<span class="spectrum-Button-label">Console</span>
</button>
</li>
</ul>
</nav>
</header>
<nav class="sub-header">
<ul class="nav" data-function="navbar">
<li class="spectrum-Tabs-itemLabel">
<a href="#" class="with-icon">
<img src="css/img/product-icon-dc.svg" class="spectrum-Icon spectrum-Icon--sizeS" alt="" style="margin-right: 10px">
<span>ADOBE DOCUMENT CLOUD SDK</span>
</a>
</li>
</ul>
</nav>
<!--Hero Image-->
<div class="spectrum-IllustratedMessage bannerImage">
<div class="spectrum">
<h2 class="spectrum-Heading1" id="title"></h2>
<p style="color: #d3d3d3; max-width: 600px; margin-top: 4px;" id="sub-title"></p>
<a class="spectrum-Button spectrum-Button--overBackground spectrum--medium spectrum--darkest" href="form.html">
<span class="spectrum-Button-label">Request Access</span>
</a>
</div>
</div>
<!-- End of Hero Image-->
<!--Card Section-->
<div class="spectrum-Article" style="padding: 40px 70px 0px 80px;max-width: 1000px; display: flex">
</div>
<section class="cards-container">
<div class="spectrum-Card" tabindex="0" role="figure">
<div class="spectrum-Card-coverPhoto">
<div class="spectrum-Card-coverPhoto" style="background-image: url(css/img/illustrations/11.svg);background-size: contain; background-repeat: no-repeat;height: calc(100% - 20px);
width: calc(100% - 20px);border-bottom-color: transparent; margin: auto; ">
</div>
</div>
<div class="spectrum-Card-body">
<div class="spectrum-Card-header">
<div class="spectrum-Card-title">
<div class="spectrum">
<!-- <p class="spectrum-Heading4">Article Heading4 <em>Emphasis</em> <strong>Strong</strong>.</p>-->
<p class="spectrum-Heading3">Embed Adobe gold standard PDF viewer</p>
</div>
</div>
<div class="spectrum-Card-actionButton">
<button aria-haspopup="true" class="spectrum-ActionButton spectrum-ActionButton--quiet">
<svg class="spectrum-Icon spectrum-Icon--sizeS" focusable="false" aria-hidden="true">
<use xlink:href="#spectrum-icon-18-More" />
</svg>
</button>
</div>
</div>
<div class="spectrum-Card-content">
<!-- <p class="spectrum-Body3">Body3 Text <em>Body3 Emphasis</em> <strong>Body3 Strong</strong>.</p>-->
<p class="spectrum-Body4"> With only a few lines of code, developers can "wow" their customers with embedded PDF features as well as customize the online end-user experience. <br/> <a href="viewSDK.html" class="spectrum-Link">Learn More</a></p>
<!--</div>-->
</div>
</div>
<!--
<div class="spectrum-Card-footer">
<a href="viewSDK.html" class="spectrum-Button spectrum-Button--primary">
<span class="spectrum-Button-label">Read More</span>
</a>
</div>
-->
</div>
<div class="spectrum-Card" tabindex="0" role="figure">
<div class="spectrum-Card-coverPhoto">
<div class="spectrum-Card-coverPhoto" style="background-image: url(css/img/illustrations/12.svg);background-size: contain; background-repeat: no-repeat;height: calc(100% - 20px);
width: calc(100% - 20px);border-bottom-color: transparent; margin: auto; "></div>
</div>
<div class="spectrum-Card-body">
<div class="spectrum-Card-header">
<div class="spectrum-Card-title">
<div class="spectrum">
<!-- <p class="spectrum-Heading4">Article Heading4 <em>Emphasis</em> <strong>Strong</strong>.</p>-->
<p class="spectrum-Heading3">Easily automate PDF conversion </p>
</div>
</div>
<div class="spectrum-Card-actionButton">
<button aria-haspopup="true" class="spectrum-ActionButton spectrum-ActionButton--quiet">
<svg class="spectrum-Icon spectrum-Icon--sizeS" focusable="false" aria-hidden="true">
<use xlink:href="#spectrum-icon-18-More" />
</svg>
</button>
</div>
</div>
<div class="spectrum-Card-content">
<!-- <div class="spectrum">-->
<!-- <p class="spectrum-Body3">Body3 Text <em>Body3 Emphasis</em> <strong>Body3 Strong</strong>.</p>-->
<p class="spectrum-Body4">Turn Microsoft documents, images and HTML pages into PDFs. Convert PDFs to Microsoft Word, Excel or PowerPoint, and various image formats (JPG, TIFF or PNG) with high fidelity. </p>
<!--</div>-->
</div>
</div>
<!--
<div class="spectrum-Card-footer">
<a href="viewSDK.html" class="spectrum-Button spectrum-Button--primary">
<span class="spectrum-Button-label">Read More</span>
</a>
</div>
-->
</div>
<div class="spectrum-Card" tabindex="0" role="figure">
<div class="spectrum-Card-coverPhoto">
<div class="spectrum-Card-coverPhoto" style=" height: calc(100% - 20px);
width: calc(100% - 20px); background-image: url(css/img/illustrations/13.svg); background-repeat: no-repeat; background-size: contain; border-bottom-color: transparent; margin: auto;"></div>
</div>
<div class="spectrum-Card-body">
<div class="spectrum-Card-header">
<div class="spectrum-Card-title">
<div class="spectrum">
<!-- <p class="spectrum-Heading4">Article Heading4 <em>Emphasis</em> <strong>Strong</strong>.</p>-->
<p class="spectrum-Heading3">Gather analytics </p>
</div>
</div>
<div class="spectrum-Card-actionButton">
<button aria-haspopup="true" class="spectrum-ActionButton spectrum-ActionButton--quiet">
<svg class="spectrum-Icon spectrum-Icon--sizeS" focusable="false" aria-hidden="true">
<use xlink:href="#spectrum-icon-18-More" />
</svg>
</button>
</div>
</div>
<div class="spectrum-Card-content">
<p class="spectrum-Body4">Get out of the box integration with Adobe Analytics. Get insights into how your PDFs are viewed and consumed. </p>
</div>
</div>
<!--
<div class="spectrum-Card-footer">
<a href="viewSDK.html" class="spectrum-Button spectrum-Button--primary">
<span class="spectrum-Button-label">Read More</span>
</a>
</div>
-->
</div>
</section>
<!--End of card section-->
<!--
<div class="spectrum-IllustratedMessage">
<svg class="spectrum-IllustratedMessage-illustration" xmlns="http://www.w3.org/2000/svg" width="200" height="98" viewBox="0 0 199 97.7"><defs><style>.cls-1,.cls-2{fill:none;stroke-linecap:round;stroke-linejoin:round;}.cls-1{stroke-width:3px;}.cls-2{stroke-width:2px;}</style></defs><title>Asset 1</title><g id="Layer_2" data-name="Layer 2"><g id="illustrations"><path class="cls-1" d="M110.53,85.66,100.26,95.89a1.09,1.09,0,0,1-1.52,0L88.47,85.66"/><line class="cls-1" x1="99.5" y1="95.5" x2="99.5" y2="58.5"/><path class="cls-1" d="M105.5,73.5h19a2,2,0,0,0,2-2v-43"/><path class="cls-1" d="M126.5,22.5h-19a2,2,0,0,1-2-2V1.5h-31a2,2,0,0,0-2,2v68a2,2,0,0,0,2,2h19"/><line class="cls-1" x1="105.5" y1="1.5" x2="126.5" y2="22.5"/><path class="cls-2" d="M47.93,50.49a5,5,0,1,0-4.83-5A4.93,4.93,0,0,0,47.93,50.49Z"/><path class="cls-2" d="M36.6,65.93,42.05,60A2.06,2.06,0,0,1,45,60l12.68,13.2"/><path class="cls-2" d="M3.14,73.23,22.42,53.76a1.65,1.65,0,0,1,2.38,0l19.05,19.7"/><path class="cls-1" d="M139.5,36.5H196A1.49,1.49,0,0,1,197.5,38V72A1.49,1.49,0,0,1,196,73.5H141A1.49,1.49,0,0,1,139.5,72V32A1.49,1.49,0,0,1,141,30.5H154a2.43,2.43,0,0,1,1.67.66l6,5.66"/><rect class="cls-1" x="1.5" y="34.5" width="58" height="39" rx="2" ry="2"/></g></g></svg>
<h2 class="spectrum-Heading spectrum-Heading--pageTitle spectrum-IllustratedMessage-heading">Error 404: Page Not Found</h2>
<p class="spectrum-Body--secondary spectrum-IllustratedMessage-description">This page isn't available. Try checking the URL or visit a different page.</p>
</div>
-->
<div class="pagebreak"> </div>
<!--Another Banner -->
<div class="post post-container contentLeft darkTheme article JamieWelcome" style="background-color:#2C2C2C">
<div class="post-image-container">
<div class="welcome-images">
<div class="post-background-image">
<img src="css/img/swoosh-big.png" data-aos="custom-slide-right" data-aos-duration="700" data-aos-once="true" class="aos-init aos-animate">
</div>
</div>
</div>
<div class="post-content-container">
<div class="content">
<h2 class="aos-init aos-animate"> Document Cloud SDK</h2>
<p>Document Cloud SDK enables developers to create compelling electronic documents experiences, including viewing, creating and exporting PDFs from popular file formats. Using the Document Cloud SDK, developers can easily automate the generation, manipulation, and transformation of content via a set of modern cloud-based APIs. </p>
<div style="background-color: #2C2C2C; color: #2C2C2C;">
<a class="spectrum-Button spectrum-Button--overBackground" href="form.html">
<span class="spectrum-Button-label">Request Access</span>
</a>
</div>
</div>
</div>
</div>
<!-- Another Banner -->
<!--Add footer-->
<div class="slide-jobs" style="background: #222">
<a name="JobOpenings"></a>
<div class="content">
<div class="jobs-scroll-container">
<div class="jobs-container">
<h3 id="JobOpenings_ExperienceDesign">From the Adobe I/O Blog</h3>
<ul class="jobs-list" style="background: #222">
<li>
<a href="jobs/jobpost-8897955/">
<p class="spectrum-Heading5">APIs</p>
</a>
<div class="job-location spectrum-Body5" style="margin-top: 10px;">
Document Cloud
</div>
<div class="job-location spectrum-Body5">
Creative Cloud
</div>
<div class="job-location spectrum-Body5">
Experience Cloud
</div>
<div class="job-location spectrum-Body5">
Adobe Experience Platform
</div>
</li>
<li>
<a href="jobs/jobpost-8897955/">
<p class="spectrum-Heading5">Blogs and Community </p>
</a>
<div class="job-location spectrum-Body5" style="margin-top: 10px;">
Adobe Tech Blog
</div>
<div class="job-location spectrum-Body5">
Adobe on GitHub
</div>
</li>
<li>
<a href="jobs/jobpost-8897955/">
<p class="spectrum-Heading5"> </p>
</a>
<div class="job-location spectrum-Body5" style="margin-top: 10px;">
Adobe I/O on Twitter
</div>
<div class="job-location spectrum-Body5">
Adobe I/O on YouTube
</div>
</li>
<li>
<a href="jobs/jobpost-8897955/">
<p class="spectrum-Heading5">Support</p>
</a>
<div class="job-location spectrum-Body5" style="margin-top: 10px;">
Contact Us
</div>
<div class="job-location spectrum-Body5">
Adobe on StackOverflow
</div>
<div class="job-location spectrum-Body5">
Adobe Product Support
</div>
<div class="job-location spectrum-Body5">
Release Notes
</div>
<div class="job-location spectrum-Body5">
Forums
</div>
</li>
<li>
<a href="jobs/jobpost-8897955/">
<p class="spectrum-Heading5">Adobe</p>
</a>
<div class="job-location spectrum-Body5" style="margin-top: 10px;">
Open Source Adobe
</div>
<div class="job-location spectrum-Body5">
Privacy Policy
</div>
<div class="job-location spectrum-Body5">
Terms of Use
</div>
<div class="job-location spectrum-Body5">
Cookies
</div>
</li>
</ul>
</div>
</div>
</div>
<!-- <img class="jobs-top-graphic" src="/images/footer-graphic-top.png">-->
</div>
<script src="js/navbar.min.js"></script>
<script>
var myMenu = new Navbar('.nav');
</script>
</body>
</html>
Copy link to clipboard
Copied
Thank you!
I downloaded the zippzd foile you mentionned, and it worked well with my python code. I think the secret is the zip format of the input. When I do a simple download of an HTML page, my code doesn't create a proper PDF output.
here is an example of my downloaded version:
and within "index_files" I have this:
But when I use the format you suggested, my python code works brilliantly.
and the zipped file you suggested is different from the first one I download.
Is it the HTML code you mentionned in the last answer that helps making this difference? should I therefore download the html file with a program (using the HTML code you suggested) instead of downloading it manually ?
thanks you for your support!
Copy link to clipboard
Copied
I think I have first to master (static html file => pdf) conversion. so I think I don't need now to use this:
<script src='./json.js' type='text/javascript'></script>
But The problem I have is how to build the zip file. I saw that it was said : "refer the sdk documentation of create-pdf-operation.js to see instructions on the structure of the zip file". But can't find this file, I find instead only this one.
Copy link to clipboard
Copied
I see where some of the confusion is coming from. The PDF Services API does not have capability to programmatically build a zip file, you will need to do this in your preferred coding language or use some 3rd party tool. I suggest doing a Google search for something like: 'python create a zip file programmatically' (without the quotes).
Copy link to clipboard
Copied
Yes that's what I thought also. But one question I have how to get (after creating the zip file programatically) the same format as the one I downloaded, with a .js and .css folder. ? would you please tell me about the inputs of that program that will create the zip? (I have no problem with the program or with Python , but I have no Idea about the inputs that will enable by the end to create the .js and .css files inside the zip).
Thank you!
Copy link to clipboard
Copied
Given the variety of posibilities, I suggest reading the documentation for whatever tool you choose to use.
Copy link to clipboard
Copied
Hello,
did you resolve your HTML zip problem?
I am able to run the sample project. I have created another zip archive of HTML file and other dependency files like CSS and js and pass it to the CreatePDFFromDynamicHTML.java file to create a PDF after executing
mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.pdfservices.operation.samples.createpdf.CreatePDFFromDynamicHTML command it gives an error
description ='The zip file provided has invalid content.; transactionId=kNWylwnmtNDbJf9ERkbjuE1Ou62467uv'; requestTrackingId='ZDdG1zgw5sKjMdnEGjvV8PAdUlJwwwNO'; statusCode=400; errorCode=INVALID_ZIP
If yes then can you please guide me on how to resolve this?
Copy link to clipboard
Copied
Can you tell me how to take a html file and make it a pdf with proper pictures and things?
Copy link to clipboard
Copied
The important part to remember is: "Since HTML/web pages typically contain external assets, the input file must be a zip file containing an index.html at the top level of the archive as well as any dependencies such as images, css files, and so on."
Here is a link to the doc: https://opensource.adobe.com/pdftools-sdk-docs/release/latest/howtos.html#create-a-pdf-from-static-h...
I suggest starting with the sample code. I don't know what programming language you are using, so here are the links to all of the sample code: