Copy link to clipboard
Copied
I'm creating ebooks and often the text comes from html files. I want to import the text with styles intact (italics, bolds, superscripts, etc). Does anyone know a proceedure for this?
Thanks in advance.
My favorite multitool for these circumstances is pandoc. It has a wide variety of input and output formats, but the relevant output format for you would be .icml - that is, an InCopy file. Pandoc will keep your text styles intact... and that's it. So it may not be the right tool for you, and I honestly don't know if it will handle your superscripts or not. But it will turn epub into something you can place into InDesign with text styling intact.
Copy link to clipboard
Copied
There may be some scripts floating around but honestly, sacrificing a goat is probably your best bet to get this to work properly. You'd be dependent on honoring CSS styles and mapping them to InDesign styles.
That barely works with Word.
Copy link to clipboard
Copied
Try this: https://www.id-extras.com/html-import-script
It's free. But you will need to upload the files somewhere -- it only works with HTML files on the web for now.
Copy link to clipboard
Copied
That may not be a limitation. A lot of e-books are created by web scraping. 😐
Copy link to clipboard
Copied
In this case I am employed by the original content creators to repurpose the content. I was afraid someone would think I was swiping someone else's work. Not the case here. 🙂
Copy link to clipboard
Copied
I should have qualified that post so it didn't look like an accusation. I just see a lot of flat-out crapware "books" slashed together from scraping. A tool optimized for that is... dismaying.
Copy link to clipboard
Copied
Hey TaW, thanks for the tip but it did not work for me. It says its 'fetching url' but there is no result when its done running the script. Has it worked for you?
Copy link to clipboard
Copied
I haven't used for a while. The id-extras.com server was migrated a while ago, I wonder if it's not working because of that. I'll check it out and post back...
Copy link to clipboard
Copied
Bob L, goats are so last millennium. We sacrifice wombats these days. 🙂
That aside, about the only path for this process i can think of is to open the HTML files with Word, tidy up and reformat as necessary, then save in .docx or .rtf for import into InDesign.
Then, most usefully, the original CSS styles should be used as the starting point for EPUB export.
Copy link to clipboard
Copied
Thanks Bob (and everyone). I used to do it through word (import html, save as .docx) but my 3 free MS 365 account doesn't want to let me open .html files. Am I doing something wrong? I woudn't mind paying for the one time Word software if I knew it had this function.
Copy link to clipboard
Copied
So, this is content on the web or do you have the HTML files? If it's on the web, is it WordPress?
I seem to remember some plugin or script that could handle this, even if it was clunky.
Copy link to clipboard
Copied
I am getting the content from the web site of the client. Looks like html 5.
I'm going to try the script TaW suggested now. Will report back.
Copy link to clipboard
Copied
Cut and paste from browser to Word is an alternative, if you make judicious use of Word's paste options and macros for cleanup.
Copy link to clipboard
Copied
Buy a license for Office 2016 and never update further. Office 365 is... crippleware for office drones.
Copy link to clipboard
Copied
Duly noted. Now, where to find a wombat?
Copy link to clipboard
Copied
My favorite multitool for these circumstances is pandoc. It has a wide variety of input and output formats, but the relevant output format for you would be .icml - that is, an InCopy file. Pandoc will keep your text styles intact... and that's it. So it may not be the right tool for you, and I honestly don't know if it will handle your superscripts or not. But it will turn epub into something you can place into InDesign with text styling intact.
Copy link to clipboard
Copied
This is great. It did not keep italics, but does do headings, sub and supertext. I used the online demo and downloaded .docx files. Thank you!
Copy link to clipboard
Copied
Correction, it did retain italics and I was able, with a minimum of fussing, to redefine the imported style sheets to my specs. Thanks very much for pointing me towards this useful tool.
Copy link to clipboard
Copied
I just tried saving a relatively simple informational page from Chrome, and then opening it in Word 365 (all I have available at this location). It worked pretty well and even preserved most styles.
I assume you're pulling content from simple pages (i.e., not complex interactive ones), so this might be worth exploring, again with some macros and style importing.
Copy link to clipboard
Copied
Weird, I wonder why it did not work for me. It was a long, complex html file.
Copy link to clipboard
Copied
Complex is relative. A page of text and images, regardless of length or number of styles, is never very complex in these terms. It's active pages using PHP and scripting and calls to e-commerce modules that get "complex." Again, my assumption is that you're working with book-like material in the first place. If you're really recasting active web elements... some different approaches will be needed.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now