How to convert PDF to HTML using c#?
Copy link to clipboard
Copied
I want to convert PDF to HTML using C#. Adobe Acrobat Pro is installed on my computer. I have written the following code when runs without any issues but can't see output.html in the folder. What could be the reason? Is there any alternate way of achieving it?
using Acrobat;
using System;
namespace PDFToHTMLWithAdobeAcrobatPro
{
internal class Program
{
static void Main(string[] args)
{
string inputFilePath = @"C:\tmp\Everest.pdf";
string outputFilePath = @"C:\tmp\output.html";
// Acrobat COM objects
AcroApp acroApp = null;
CAcroPDDoc pdfDoc = null;
try
{
// Initialize Acrobat application
acroApp = Activator.CreateInstance(Type.GetTypeFromProgID("AcroExch.App")) as AcroApp;
pdfDoc = Activator.CreateInstance(Type.GetTypeFromProgID("AcroExch.PDDoc")) as CAcroPDDoc;
if (acroApp == null || pdfDoc == null)
{
Console.WriteLine("Failed to initialize Acrobat COM objects.");
return;
}
// Open the PDF
if (!pdfDoc.Open(inputFilePath))
{
Console.WriteLine("Failed to open the PDF file.");
return;
}
// Execute JavaScript for HTML conversion
object jsObj = pdfDoc.GetJSObject();
if (jsObj != null)
{
Type jsType = jsObj.GetType();
jsType.InvokeMember(
"saveAs",
System.Reflection.BindingFlags.InvokeMethod,
null,
jsObj,
new object[] { outputFilePath, "com.adobe.acrobat.html" }
);
Console.WriteLine($"PDF successfully converted to HTML at {outputFilePath}");
}
else
{
Console.WriteLine("Failed to retrieve the JavaScript object.");
}
}
catch (Exception ex)
{
Console.WriteLine($"An error occurred: {ex.Message}");
}
finally
{
// Close the PDF document
if (pdfDoc != null)
{
pdfDoc.Close();
}
// Exit the Acrobat application
if (acroApp != null)
{
acroApp.Exit();
}
}
}
}
}
Copy link to clipboard
Copied
Rewrite the code to keep Acrobat open. Look in the Acrobat JavaScript console window after the code has run.
Are any errors reported? I suspect there will be a security exception thrown because the SaveAs function requires priviledge to run. What you need to do is place the JS code into a folder level trusted function and then run the function.
Here are some articles that might help:
https://www.pdfscripting.com/members/Trust-and-Privilege-in-Acrobat-Scripts.cfm
https://www.pdfscripting.com/public/How-to-Save-a-PDF-2.cfm
Use the Acrobat JavaScript Reference early and often

