• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

How to batch-convert PDF files to HTML using AppleScript (or any other means) on Mac OS

Enthusiast ,
Nov 09, 2019 Nov 09, 2019

Copy link to clipboard

Copied

Looking for ways to batch convert PDFs to HTML using Acrobat Pro DC
Basically I'm looking to automate the following: Acrobat > Menu: Export To > HTML Web Page > {Settings: Single HTML Page, Include Images, Recognize text if needed, Set Language}

 

This forums page from 2017 shows a promising AppleScript approach, but so far only the JPG exports are working.

 

Been chasing down AppleScript (osascript), JXA (AppleScript's flavor of JS), Acrobat JS, and Command Line, but haven't cracked it yet.

TOPICS
Acrobat SDK and JavaScript

Views

3.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 09, 2019 Nov 09, 2019

Copy link to clipboard

Copied

None of those are necessary. You can simply use the Action Wizard that's built-in into Acrobat Pro.

Create a new Action and add to it the Save command (from the Save & Export sub-panel) and then click on Specify Settings underneath it and select the following options:

 

Snap3.png

 

Then save your Action and run it on your PDF files to convert them to HTML files. All done!

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Nov 09, 2019 Nov 09, 2019

Copy link to clipboard

Copied

Thanks try67

 

If necessary I'll go that route.  The reason I was hoping to work out an externally-coded approach: We've got thousands of files, scattered across multiple volumes.  I'd like the external code to read file list and in turn generate the HTMLs right next to the originals, ideally without ever having to open Acrobat directly.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 09, 2019 Nov 09, 2019

Copy link to clipboard

Copied

If you're looking for a solution that works independently of Acrobat then a forum about Acrobat is not really the right place for your question...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Nov 09, 2019 Nov 09, 2019

Copy link to clipboard

Copied

I may have stated it incorrectly -- Acrobat would be open, and the code interacts with it, but it runs externally, without having to manually tend to Acrobat.

 

Check out the forums page from 2017.  It's a clean clear solution that would work perfectly, except for one minor issue: it fails in 2019.

 

I wonder if -- given the additional specs for HTML conversion (e.g. "recognize text if needed") -- osascript needs more specs in order to successfully process.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 09, 2019 Nov 09, 2019

Copy link to clipboard

Copied

Acrobat is not built (nor licensed) for that kind of operation, I'm afraid.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Nov 09, 2019 Nov 09, 2019

Copy link to clipboard

Copied

Hmm -- works fine for JPG automation, albeit only for JPG automation, using precisely that osascript approach.  That suggests it's built for it -- or at one point was built for it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Nov 09, 2019 Nov 09, 2019

Copy link to clipboard

Copied

LATEST

Actually, I may have jumped the gun on all of this.  I had test run a bunch of documents using the manal PDF-to-HTML function, and got a nearly perfect results every time: great OCR, excellent fidelity to the original scans, wrapped in HTML lending itself to automation.  It was a lucky bunch of documennts.

 

Using your action approach to test another thousand documents, the results were mixed -- some some good, some ok, many terrible to useless.  Which is to say, OCR here in 2019, in Acrobat, tesseract, abbysoft, neat --  and maybe OCR in general is still, at best, haltingly reliable.

 

Thanks try67 for the Actions hint.  Will be useful in many other ways, but alas not for this particular effort.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines