Skip to main content
September 15, 2009
Question

HTML Help worked in RH7 but not in RH8

  • September 15, 2009
  • 2 replies
  • 4961 views

I have a sizeable HTML Help project that worked fine in RH7. The raw HTML files are installed into an application that uses the SAX parser to parse the HTML. This all worked correctly in RH7. After upgrading to RH8, the same HTML files installed into the application now fail with the following error message: "org.xml.sax.SAXParseException: Content is not allowed in prolog."

Upon examination of the same HTML file that works in RH7 but not in RH8, we note the following differences:

In RH7, the HTML preceding the first <head> token is:

<!doctype HTML public "-//W3C//DTD HTML 4.0 Frameset//EN">

IN RH8, the HTML preceding the first <head> token is:

<?xml version="1.0" encoding="utf-8" ?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3/org/1999/xhtml">

We have verified that we if manually modify the RH8 file to look like the RH7 file as above, it works in the application.

Is there a setting somewhere in RH8 that I need to change?

Any suggestions will be greatly appreciated, but please don't tell me to manually modify the HTML in 300+ help topics.

Bob Boller

    This topic has been closed for replies.

    2 replies

    Participant
    September 20, 2009

    I have a different, but possibly related problem.

    I developed several WebHelp systems with Madcap Flare, only to discover that when served by our product's homegrown HTTP 1.1 web server, though they worked fine for FireFox, they did not work with IE 7 or IE 8. The problem is that our web server, seeing the Madcap output files have a .htm extension, sends a MIME Type of text/html with the response. Intenet Explorer, apparently, inspects the file itself for the MIME Type, and sees the the files are actually XML files, starting with this:

    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

    Now, we could adjust our web server--apparently Apache and IIS have done so--but instead I've been told to migrate my help systems to a tool that works with our web server.

    So I downloaded a trial version of RoboHelp 8 and created a test system to be sure RoboHelp works. Unfortunately, it has the same problem, and this is not surprising since the RoboHelp 8 output files also begin with:

    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    My first question is (and from what I've read I think I know the answer), does RoboHelp 7 have this <?xml...> and XHTML prolog?

    My second question, assuming the answer to the first is No, is how easy is it to obtain RoboHelp 7 at this point? Does Adobe insist that new licensees get RoboHelp 8?

    Incidentally, with Madcap, I tried removing the "<?xml version="1.0" encoding="utf-8"?>" line from the output .htm files, but this did not solve the problem for Internet Explorer (so maybe Internet Explorer is looking somewhere else to conclude that these are XML files--maybe it's the DOCTYPE line, which I didn't remove). I haven't tried this for my RoboHelp 8 test system, but hesitate to do so because the developer that would test it doesn't have time to test things likely to fail.

    Has anyone else run into this problem with non-Apache and non-IIS web servers? I can supply further particulars about it (for example, it works on IE if we turn compression off; but doing so would slow our system down too much, apparently).

    Thanks,

    - Willie

    Peter Grainge
    Community Expert
    Community Expert
    September 20, 2009

    Your problem is sort of related.

    The server has to be set up to recognise UTF8 and it sounds like the problem is that yours is not.

    I had a problem where the output was OK in IE but only the BOM characters showed in Firefox. This is what I was advised by the company hosting my site.

    "I would therefore conclude that the solution to this problem (on Linux systems running Apache) is to add the AddDefaultCharset utf-8 directive to either the Apache config or the site .htaccess file. The advantage of the latter is that it only affects individual sites. The default Apache character set is taken from the locale file on Linux and defaults to iso-8859-1. It is the conflict between the Apache header with iso-8859-1 and the page character set of utf-8 that obviously causes Firefox a problem."

    In a forum post Chrissy_Tissy added

    My machine is Windows, but this fix still worked  - some notes about making the fix visible:

    1. Do the fix itself (httpd.conf: AddDefaultCharset utf-8).

    2. Restart the box to apply the fix.

    3. Once the box is restarted, clear your cache in FireFox to make sure you don't continue to see the cached file.

    Once all this is done you will see the output content as expected.

    I am wondering if your server can be amended in a similar way? If not, in RH8 look in Tools > Options and tick the options I have highlighted. See if that produces an output that will be agreeable to your server.

    Finally, if not, Adobe does have a tool that works on the output and changes the encoding to whatever you want. Trouble is it works on one folder at a time so it can be painful if you have many folders.

    I would appreciate you posting back the solution you finally go for. It all helps us when people have similar problems.


    See www.grainge.org for RoboHelp and Authoring tips

    Help others by clicking Correct Answer if the question is answered. Found the answer elsewhere? Share it here. "Upvote" is for useful posts.
    RoboColum_n_
    Legend
    September 16, 2009

    Hi Bob.

    I think you may have to change all 300+ files but you could easily do this with a find and replace tool like BkReplacem or FAR. Both are excellent for this type of thing.

    The problem here is that you are using the raw topic files and effectively generating the output outside of RH. If you are doing this you can't expect Adobe to support the SAX parser. RH8 uses XHTML as opposed to RH7's HTML and upgrades each topic when the project is first opened. I can't see any way back unless there is something on the SAX parser side to allow for XHTML.


    Read the RoboColum(n).

    Willam van Weelden
    Inspiring
    September 16, 2009

    Hi,

    Changing the doctype of your output files from XHTML to HTML might not be such a good idea. XHTML has a (very slightly) different syntax then HTML and changing the DTD may have unforeseen consequences, although it will probably work for most browsers. In any case, your output will no longer be 'valid' as you will be using some incorrect syntax for HTML. See http://www.w3schools.com/XHTML/xhtml_html.asp for an overview of the difference between HTML and XHTML.

    I don't know anything about the SAX parser, but I agree with Colum that the only (and probably the best) way is to get the parser to work with XHTML.

    Greet,

    Willam

    Community Expert
    September 22, 2009

    Peter - That's what I'm doing. It just that only the files compiles into the

    chm are in HTML. The RH-edited files remain in XHTML.

    Bob


    I did a quick test, and generating webhelp with the setting ticked, my topics get the following doc type:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

    Just to clarify, the setting only applies to the output files (i.e. the conversion takes place as part of the Generation process), not the source files that you edit in RH.

    So if you output to Webhelp, you won't need to to the decompile step in HTML Help Workshop - all your topics will be in HTML.