Copy link to clipboard
Copied
I have a project in RoboHelp 2022. I have imported about 5 word documents in the project. I use the Microsoft HTML Help output preset. When I create the CHM file I notice that some topics are missing from the CHM file.
* The topics are referenced from the toc but with a '?'-prefix in the link.
* When I click one of the ? links, information is shown in the reading pane that the HTML file is missing.
* The missing HTML files are not really missing, they are located in the correct folders
* However, it the title tag in a "missing" HTML file a "LTR" (u200E) unicode character has been introduced.
After some debugging, it turned out that it was manual page breaks that was the culprit. Page breaks had been inserted before some Heading 2 to make the document look good when PDF versions are created. The "HTML topic" following the page break is the one not appearing in the CHM file
So what to do? I remove the page breaks and the CHM is created correctly and all topics are included as expected.
Why does this happen? Is there a work around so that I can keep the page breaks where needed?
Thanks
/Henrik
Copy link to clipboard
Copied
Personally, I'd be copy/pasting the content from the Word doc into a text editor like Notepad first to remove all the cruft that Word sticks in & then copy that plain text to RH and apply my styles there.
Copy link to clipboard
Copied
I can relate to that strategy, unfortunately the amount of information in the production batch of documents, does not allow for non-automated solutions.
I see that there is an alternative when importing word documents, to use Post import scripts. Could that be an alternative?
Copy link to clipboard
Copied
Some sort of grand find-and-replace script? Probably.
Copy link to clipboard
Copied
So, these are Word documents that are sourced in Word (with PDFs created from Word) that you need to import into RoboHelp in order to create CHM files from them?
If there's no way to tell in the Word source, which page breaks have been added to make the PDF look nice, and which page breaks need to be carried through to the help, I don't see how scripting is going to work either. There are ways in Word that you might be able to indicate that a page break is necessary vs. cosmetic, and then strip out the cosmetic ones before importing to RoboHelp. But, in my mind, that's an unnecessarily complicated solution to what is really a "multiple source" problem. Using the same source for PDF and CHM outputs would be the correct solution.
The PDF I produce from RoboHelp looks quite good, and Word is such a beast, that my advice would be to import into RoboHelp one time and abandon the old Word source. Then you can do whatever you want with page breaks from within RoboHelp.
Sorry if I'm missing something here...
Copy link to clipboard
Copied
The scripting part is not a problem for us. We have a large number (approx 170) of word documents and we do not update all of them for each release. So we have a script that copies all files that have been updated. These copies are the ones that are imported. So removing any page breaks will have no affect on the PDF nor on continues updating of the document. That is the good part. As a workaround I have created a VBA script that removes the page breaks. So now the topics are included in the CHM file. I wish of course that we didn't need to do that workaround, but there we are.
To work directly in RoboHelp would of course be a dream, but as all developers create documentation, word is more cost effective, alas.
Copy link to clipboard
Copied
If the source content is maintained in Word, but you're not actually using Word to create output (you're using RH to create CHMs for that), why not "de-style" the Word docs so that they follow a strict Word template style that allows simpler import into RH? Then there's no reason for authors to "pretty it up" in Word in the first place.
Copy link to clipboard
Copied
It sounds like they are creating PDFs from the Word, which is why there are page breaks — cosmetic page breaks for the PDFs that are produced out of Word. That's why I called it "dual source," or at least it is "dual tool."
Copy link to clipboard
Copied
Most developers I know hate Word! 🙂 I wonder if markdown might not be a better option?
But alas, the fundemental problem is that page breaks for PDF output usually are a "post processing" activity, added after-the-fact in random locations to make things look pretty, and don't really fit well with a multi-output source model. Stripping them out programmatically makes the best sense, really, especially since there's no way to strictly enforce a template in Word.
Copy link to clipboard
Copied
I agree that Word is not an optimal solution and in a perfect world any post-processing should be done automatically. We are actually doing a proof of concept with Markdown/DITA-OT to test how far we can automate the process. I would prefer the XML, but I am not so sure our developers would approve 🙂
Would it work to use Markdown source files with RoboHelp?
Copy link to clipboard
Copied
I have not tried importing markdown into RoboHelp. I've done it with other XML authoring tools and it was (more or less) easy. I'm not sure if it would solve all your problems, but developers do like it better than most other tools (mostly because they are familiar with it).
Copy link to clipboard
Copied
I believe there is now a markdown import feature, so if you try it, do report back with what you find. 🙂
Copy link to clipboard
Copied
This started with Word import but now seems to be PDF import. However, there is no PDF import in 2022. Please clarify.
________________________________________________________
My site www.grainge.org includes many free Authoring and RoboHelp resources that may be of help.
Copy link to clipboard
Copied
No, it's a Word import. HenriK is doing two things:
1. Word source to PDF
2. Word source to RoboHelp to CHM
Find more inspiration, events, and resources on the new Adobe Community
Explore Now