Can we make the RH HTML editor not insert carriage return/line feed (CR LF) characters?

Enthusiast ,
Sep 24, 2020

Copy link to clipboard

Copied

RH 2017 latest version.

 

Hi Neighborhood Friendly RoboHelpers,

Inside of RoboHelp you can type your text in the Design view and it looks great:

2020-09-24_16-20-09.png

 

But the same text in the code view, RoboHelp inserts some random carriage return and line feed (CR LF) into the line of text, like this (I highlighted the ends of lines to show what I mean):

2020-09-24_16-17-59.png

Why? I don't know. I guess maybe to make it more readable in the editor? That's all well and good, but is there any way to turn off this behavior? I didn't see any option in RH's settings to do that? Is there perhaps an undocumented RH registry entry that someone knows about that does this?

 

We're trying to work around a localization problem in some topics and these extra CR LF characters getting inserted all over the place by RH complicate things. Here's how those look in NotePad++ (and it's how our translation software sees it):

2020-09-24_16-22-36.png

Thanks for any help you can offer.

 

Jared

In old RoboHelp, you can't switch that off. In the new RoboHelp (Summer 2020 release) it's off by default. Aside from many other reasons for upgrading to the new generation of RoboHelp, that will also solve this problem.

 

Regarding the code: I guess they are wrapped in an element like <code> or <pre>?

In Across Language Server there are the Document Settings Templates (DST). Check the one you are using for your HTML files (probably "Tagged HTML" or "Tagged XML (v2). There you should be able to exclude normalization for such code blocks.

Then the strings in <p> will get properly normalized and can be properly segmented, while the PCDATA inside the <code> element will not be normalized if you turn off normalization for the code element:

 

<body>
	<p>Some content with CR/LF is here. The HTML import filter of Across should be
		able to normalize the whitespace on this PCDATA just fine, so that you get
		proper segments.</p>
	<code>
				if (hour > 18) {
	    			greeting = "Good evening";
	    		} else {
	    			greeting = "Good day";
	    		}
	</code>
</body>

 

Alternatively, you could explicitly force it from the source code side with xml:space="preserve". You can apply it on the <pre> element (but not on the <code> element):

 

<body>
	<p>Some content with CR/LF is here. The HTML import filter of Across should
		be able to normalize the whitespace on this PCDATA just fine, so that you
		get proper segments.</p>
	<pre xml:space="preserve">
  		<code>
			if (hour > 18) {
    			greeting = "Good evening";
    		} else {
    			greeting = "Good day";
    		}    			
  		</code>
  	</pre>
</body>

 

xml:space="preserve" will force ALS' parser to respect the whitespace within the pre block, while the PCDATA in other elements like p, li, etc. will get normalized.

TOPICS
Classic

Views

120

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Can we make the RH HTML editor not insert carriage return/line feed (CR LF) characters?

Enthusiast ,
Sep 24, 2020

Copy link to clipboard

Copied

RH 2017 latest version.

 

Hi Neighborhood Friendly RoboHelpers,

Inside of RoboHelp you can type your text in the Design view and it looks great:

2020-09-24_16-20-09.png

 

But the same text in the code view, RoboHelp inserts some random carriage return and line feed (CR LF) into the line of text, like this (I highlighted the ends of lines to show what I mean):

2020-09-24_16-17-59.png

Why? I don't know. I guess maybe to make it more readable in the editor? That's all well and good, but is there any way to turn off this behavior? I didn't see any option in RH's settings to do that? Is there perhaps an undocumented RH registry entry that someone knows about that does this?

 

We're trying to work around a localization problem in some topics and these extra CR LF characters getting inserted all over the place by RH complicate things. Here's how those look in NotePad++ (and it's how our translation software sees it):

2020-09-24_16-22-36.png

Thanks for any help you can offer.

 

Jared

In old RoboHelp, you can't switch that off. In the new RoboHelp (Summer 2020 release) it's off by default. Aside from many other reasons for upgrading to the new generation of RoboHelp, that will also solve this problem.

 

Regarding the code: I guess they are wrapped in an element like <code> or <pre>?

In Across Language Server there are the Document Settings Templates (DST). Check the one you are using for your HTML files (probably "Tagged HTML" or "Tagged XML (v2). There you should be able to exclude normalization for such code blocks.

Then the strings in <p> will get properly normalized and can be properly segmented, while the PCDATA inside the <code> element will not be normalized if you turn off normalization for the code element:

 

<body>
	<p>Some content with CR/LF is here. The HTML import filter of Across should be
		able to normalize the whitespace on this PCDATA just fine, so that you get
		proper segments.</p>
	<code>
				if (hour > 18) {
	    			greeting = "Good evening";
	    		} else {
	    			greeting = "Good day";
	    		}
	</code>
</body>

 

Alternatively, you could explicitly force it from the source code side with xml:space="preserve". You can apply it on the <pre> element (but not on the <code> element):

 

<body>
	<p>Some content with CR/LF is here. The HTML import filter of Across should
		be able to normalize the whitespace on this PCDATA just fine, so that you
		get proper segments.</p>
	<pre xml:space="preserve">
  		<code>
			if (hour > 18) {
    			greeting = "Good evening";
    		} else {
    			greeting = "Good day";
    		}    			
  		</code>
  	</pre>
</body>

 

xml:space="preserve" will force ALS' parser to respect the whitespace within the pre block, while the PCDATA in other elements like p, li, etc. will get normalized.

TOPICS
Classic

Views

121

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Sep 24, 2020 0
Adobe Community Professional ,
Sep 24, 2020

Copy link to clipboard

Copied

I don't think there is and I don't remember anyone mentioning it before.

 

I know lots of people have gotten RH content translated successfully. The only thing I can think of is getting the translation company to tune their rules to ignore them. Maybe ignore all that don't immediately follow an html tag? I know they have to make rules for different formats of "things", so I assume this would be possible.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 24, 2020 0
Adobe Community Professional ,
Sep 25, 2020

Copy link to clipboard

Copied

The code wrap has never been in sync with design view and that has sometimes caused issues with find and replace at code level. Your particular problem is, as Amber says, the first time reported here to the best of my knowledge.

 

Whilst it doesn't solve this issue, have you seen the translation feature in RoboHelp 2020? You can generate an XLIFF file and hand that over to your translation agency. When you get the translation back, it can get intergrated into the appropriate language copy of your project. 

 

https://www.grainge.org/pages/authoring/rh_tour/rh2020/authoring/translations.htm

 

Of course, upgrading your projects might not be the easiest task in the world.

 

Please use the blue Reply button at the top to help me help you. The black Reply link nests replies and they sort out of order.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 25, 2020 0
Adobe Employee ,
Sep 25, 2020

Copy link to clipboard

Copied

Old RoboHelp versions before the next-generation RoboHelp (starting with the 2019 release), did this line wrapping with CR/LF in code around a certain column (something like column 80 or so). Most editors did this at this time (and many still do today). And yes, it's for the readability of code. Next-generation editors like the new RoboHelp do not need that anymore as they support “virtual line wrapping” for content in code ("PCDATA").

 

Any professional editor or rendering engine (like a web browser) has no problem with that. It's called “whitespace handling.” The Unicode consortium clearly outlines rules for whitespace handling (e.g., here: XHTML Family User Agent Conformance). And as you can see in RoboHelp Author View or when you open the topic in any web-browser, both agents handle it accordingly compliant (in a nutshell: Whitespace characters like space (&#x0020;), horizontal tabulation (&#x0009;), carriage return (&#x000D;), and line feed (&#x000A;) need to get “normalized” which - simply said - means they need to be merged/removed/ignored).

 

Now, it looks like your localization/translation tool does not handle it or is not configured correctly. However, most CAT tools I know have a configurable option for that.

E.g., in SDL Trados Studio, you can find this in the Project Settings > File Types > XHTML 1.1 > Whitespace. There you can configure whitespace handling for both whitespace in content ("Always preserve" / "Normalize unless xml:space="preserve" / "Always normalize") and whitespace in tags ("Always preserve" on or off). The recommended option is "Normalize unless xml:space="preserve" to be fully compliant. Make sure this option is turned ON, and then the parser of SDL Trados Studio or a similar tool will merge the whitespace in PCDATA according to the rules. You should not get wrong segmentation on these scenarios then anymore. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 25, 2020 0
Enthusiast ,
Sep 28, 2020

Copy link to clipboard

Copied

Thanks Amebr, Peter, and Stefan,

 

We have been localizing our RoboHelp projects directly, since 2004, not exporting XLIFF files.

 

Our CAT tool is Across Systems. They do have an option that normalizes white spaces. We have that turned on and in most of our thousands of topics things are fine. But we have about a hundred or so topics where we use some nonbreaking space code characters to format some code samples, so that thins line up, but in our localized output because of the normalization these get stripped.

 

It seemed to us, that perhaps we could turn off this normalization option. But the only way that would be useful for us would be if RoboHelp had a way of not inserting these CR LF characters. It sounds like that's not possible though, so we'll have to come up with some other way of dealing with these hundred or so topics and their code samples.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 28, 2020 0
Adobe Employee ,
Sep 28, 2020

Copy link to clipboard

Copied

In old RoboHelp, you can't switch that off. In the new RoboHelp (Summer 2020 release) it's off by default. Aside from many other reasons for upgrading to the new generation of RoboHelp, that will also solve this problem.

 

Regarding the code: I guess they are wrapped in an element like <code> or <pre>?

In Across Language Server there are the Document Settings Templates (DST). Check the one you are using for your HTML files (probably "Tagged HTML" or "Tagged XML (v2). There you should be able to exclude normalization for such code blocks.

Then the strings in <p> will get properly normalized and can be properly segmented, while the PCDATA inside the <code> element will not be normalized if you turn off normalization for the code element:

 

<body>
	<p>Some content with CR/LF is here. The HTML import filter of Across should be
		able to normalize the whitespace on this PCDATA just fine, so that you get
		proper segments.</p>
	<code>
				if (hour > 18) {
	    			greeting = "Good evening";
	    		} else {
	    			greeting = "Good day";
	    		}
	</code>
</body>

 

Alternatively, you could explicitly force it from the source code side with xml:space="preserve". You can apply it on the <pre> element (but not on the <code> element):

 

<body>
	<p>Some content with CR/LF is here. The HTML import filter of Across should
		be able to normalize the whitespace on this PCDATA just fine, so that you
		get proper segments.</p>
	<pre xml:space="preserve">
  		<code>
			if (hour > 18) {
    			greeting = "Good evening";
    		} else {
    			greeting = "Good day";
    		}    			
  		</code>
  	</pre>
</body>

 

xml:space="preserve" will force ALS' parser to respect the whitespace within the pre block, while the PCDATA in other elements like p, li, etc. will get normalized.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 28, 2020 0
Enthusiast ,
Sep 28, 2020

Copy link to clipboard

Copied

Hi Stefan,

 

No, we aren't using <pre> or <code> tags for the content. They are typically just styled/formatted with something like <div class="code"> tag in our content. We weren't aware of the <pre> or <code> tags. We'll look into using <pre> or <code> and modifying the DST to not remove normalization on those.

 

Thanks for responding and giving us a possible way forward.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 28, 2020 1