Copy link to clipboard
Copied
RH 2017 latest version.
Hi Neighborhood Friendly RoboHelpers,
Inside of RoboHelp you can type your text in the Design view and it looks great:
But the same text in the code view, RoboHelp inserts some random carriage return and line feed (CR LF) into the line of text, like this (I highlighted the ends of lines to show what I mean):
Why? I don't know. I guess maybe to make it more readable in the editor? That's all well and good, but is there any way to turn off this behavior? I didn't see any option in RH's settings to do that? Is there perhaps an undocumented RH registry entry that someone knows about that does this?
We're trying to work around a localization problem in some topics and these extra CR LF characters getting inserted all over the place by RH complicate things. Here's how those look in NotePad++ (and it's how our translation software sees it):
Thanks for any help you can offer.
Jared
Copy link to clipboard
Copied
I don't think there is and I don't remember anyone mentioning it before.
I know lots of people have gotten RH content translated successfully. The only thing I can think of is getting the translation company to tune their rules to ignore them. Maybe ignore all that don't immediately follow an html tag? I know they have to make rules for different formats of "things", so I assume this would be possible.
Copy link to clipboard
Copied
The code wrap has never been in sync with design view and that has sometimes caused issues with find and replace at code level. Your particular problem is, as Amber says, the first time reported here to the best of my knowledge.
Whilst it doesn't solve this issue, have you seen the translation feature in RoboHelp 2020? You can generate an XLIFF file and hand that over to your translation agency. When you get the translation back, it can get intergrated into the appropriate language copy of your project.
https://www.grainge.org/pages/authoring/rh_tour/rh2020/authoring/translations.htm
Of course, upgrading your projects might not be the easiest task in the world.
Copy link to clipboard
Copied
Old RoboHelp versions before the next-generation RoboHelp (starting with the 2019 release), did this line wrapping with CR/LF in code around a certain column (something like column 80 or so). Most editors did this at this time (and many still do today). And yes, it's for the readability of code. Next-generation editors like the new RoboHelp do not need that anymore as they support “virtual line wrapping” for content in code ("PCDATA").
Any professional editor or rendering engine (like a web browser) has no problem with that. It's called “whitespace handling.” The Unicode consortium clearly outlines rules for whitespace handling (e.g., here: XHTML Family User Agent Conformance). And as you can see in RoboHelp Author View or when you open the topic in any web-browser, both agents handle it accordingly compliant (in a nutshell: Whitespace characters like space ( ), horizontal tabulation (	), carriage return (
), and line feed (
) need to get “normalized” which - simply said - means they need to be merged/removed/ignored).
Now, it looks like your localization/translation tool does not handle it or is not configured correctly. However, most CAT tools I know have a configurable option for that.
E.g., in SDL Trados Studio, you can find this in the Project Settings > File Types > XHTML 1.1 > Whitespace. There you can configure whitespace handling for both whitespace in content ("Always preserve" / "Normalize unless xml:space="preserve" / "Always normalize") and whitespace in tags ("Always preserve" on or off). The recommended option is "Normalize unless xml:space="preserve" to be fully compliant. Make sure this option is turned ON, and then the parser of SDL Trados Studio or a similar tool will merge the whitespace in PCDATA according to the rules. You should not get wrong segmentation on these scenarios then anymore.
Copy link to clipboard
Copied
Thanks Amebr, Peter, and Stefan,
We have been localizing our RoboHelp projects directly, since 2004, not exporting XLIFF files.
Our CAT tool is Across Systems. They do have an option that normalizes white spaces. We have that turned on and in most of our thousands of topics things are fine. But we have about a hundred or so topics where we use some nonbreaking space code characters to format some code samples, so that thins line up, but in our localized output because of the normalization these get stripped.
It seemed to us, that perhaps we could turn off this normalization option. But the only way that would be useful for us would be if RoboHelp had a way of not inserting these CR LF characters. It sounds like that's not possible though, so we'll have to come up with some other way of dealing with these hundred or so topics and their code samples.
Copy link to clipboard
Copied
In old RoboHelp, you can't switch that off. In the new RoboHelp (Summer 2020 release) it's off by default. Aside from many other reasons for upgrading to the new generation of RoboHelp, that will also solve this problem.
Regarding the code: I guess they are wrapped in an element like <code> or <pre>?
In Across Language Server there are the Document Settings Templates (DST). Check the one you are using for your HTML files (probably "Tagged HTML" or "Tagged XML (v2). There you should be able to exclude normalization for such code blocks.
Then the strings in <p> will get properly normalized and can be properly segmented, while the PCDATA inside the <code> element will not be normalized if you turn off normalization for the code element:
<body>
<p>Some content with CR/LF is here. The HTML import filter of Across should be
able to normalize the whitespace on this PCDATA just fine, so that you get
proper segments.</p>
<code>
if (hour > 18) {
greeting = "Good evening";
} else {
greeting = "Good day";
}
</code>
</body>
Alternatively, you could explicitly force it from the source code side with xml:space="preserve". You can apply it on the <pre> element (but not on the <code> element):
<body>
<p>Some content with CR/LF is here. The HTML import filter of Across should
be able to normalize the whitespace on this PCDATA just fine, so that you
get proper segments.</p>
<pre xml:space="preserve">
<code>
if (hour > 18) {
greeting = "Good evening";
} else {
greeting = "Good day";
}
</code>
</pre>
</body>
xml:space="preserve" will force ALS' parser to respect the whitespace within the pre block, while the PCDATA in other elements like p, li, etc. will get normalized.
Copy link to clipboard
Copied
Hi Stefan,
No, we aren't using <pre> or <code> tags for the content. They are typically just styled/formatted with something like <div class="code"> tag in our content. We weren't aware of the <pre> or <code> tags. We'll look into using <pre> or <code> and modifying the DST to not remove normalization on those.
Thanks for responding and giving us a possible way forward.