Highlighted

Search not working as expected due to MSO html code

Participant ,
Feb 26, 2018

Copy link to clipboard

Copied

I'm using RH2015, Build 12.0.4.460 to generate WebHelp.  I have had reports of users not being able to find content that I know exists in the help.  I have determined that the issue is being caused by what appears to me to be randomly inserted Microsoft Office (MSO) code.  In the most recent example, a user did a search for "Processing payments before scheduled date".  The html code reveals some random html code separating the "P" from the rest of the word "Processing".

mso1.png

This code is causing the search to fail.  A search for "rocessing payments before scheduled date" works.  We are not using the Tahoma font family, and we are using a Master page that, among other things, applies the .css page we want to use for styling the help output.

I have also noticed numerous replications of the following code block in the HTML:

mso2.png

If the above image is too small to read, the blocks are "CustomDocumentProperties" blocks.  I did not intentionally insert any of them.  If I delete them, RH seems to re-insert them.

Questions:

1) Is anyone else experiencing this?

2) Is there a way to remove all this MSO code?

3) Is there a way to keep it from coming back?

Note: If I do any block copying of text from MS Word into RH, I copy the MS Word text into Notepad, which strips out everything but the text, and then paste the text into RH, reformatting it in RH as necessary.

4) Is there a way to make RH ignore the html code and just look at the displayed text when performing a search?

Thanks!

TOPICS
WebHelp

Views

178

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Search not working as expected due to MSO html code

Participant ,
Feb 26, 2018

Copy link to clipboard

Copied

I'm using RH2015, Build 12.0.4.460 to generate WebHelp.  I have had reports of users not being able to find content that I know exists in the help.  I have determined that the issue is being caused by what appears to me to be randomly inserted Microsoft Office (MSO) code.  In the most recent example, a user did a search for "Processing payments before scheduled date".  The html code reveals some random html code separating the "P" from the rest of the word "Processing".

mso1.png

This code is causing the search to fail.  A search for "rocessing payments before scheduled date" works.  We are not using the Tahoma font family, and we are using a Master page that, among other things, applies the .css page we want to use for styling the help output.

I have also noticed numerous replications of the following code block in the HTML:

mso2.png

If the above image is too small to read, the blocks are "CustomDocumentProperties" blocks.  I did not intentionally insert any of them.  If I delete them, RH seems to re-insert them.

Questions:

1) Is anyone else experiencing this?

2) Is there a way to remove all this MSO code?

3) Is there a way to keep it from coming back?

Note: If I do any block copying of text from MS Word into RH, I copy the MS Word text into Notepad, which strips out everything but the text, and then paste the text into RH, reformatting it in RH as necessary.

4) Is there a way to make RH ignore the html code and just look at the displayed text when performing a search?

Thanks!

TOPICS
WebHelp

Views

179

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Feb 26, 2018 0
Adobe Community Professional ,
Feb 26, 2018

Copy link to clipboard

Copied

Someone somewhere along the line has imported a Word document and I have seen it cause problems, albeit I don't think it was with the search. I have documented some of the issues in Print Issues on my site. I know you are not talking about printed documents but it's the same problem.

How the code got between the characters in the string the user should see is a mystery and I don't recall having seen that happen.

The answers to your questions are:

1) Is anyone else experiencing this?

Not what you are experiencing but issues with MSO yes.

2) Is there a way to remove all this MSO code?

Oh dear. How to break this gently? It's a Find and Replace job and minor differences in the find string could make that time consuming. If you or one of your developers are good with Regular expressions it will help.

3) Is there a way to keep it from coming back?

Maybe see my page on Importing to see if that improves things. Always check the code after an import.

4) Is there a way to make RH ignore the html code and just look at the displayed text when performing a search?

In theory that is what it is doing but clearly it is getting screwed up by this HTML. That should be reported as a bug. Please follow this link to report bugs and request features. The more people who do so, the higher it gets prioritised.

https://tracker.adobe.com

Post the link to that bug/feature in this thread and others can vote for it.

Before you do any cleaning up, create a backup as this type of cleanup can easily wreck things. I always recommend creating the backup as a zip file. That prevents you opening it accidentally and working on it before you realise it was the backup. This way you always have a clean backup that can be used to create a new clean copy time after time.

How many topics have this sort of code in them?

Sorry it's not an easier answer.


See www.grainge.org for free RoboHelp and Authoring information.

@petergrainge

www.grainge.org for free RoboHelp & Authoring info. Use the blue Reply button at the top to help me help you.
The black Reply link nests replies and they sort out of order.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Feb 26, 2018 0
Adobe Community Professional ,
Feb 26, 2018

Copy link to clipboard

Copied

I did a quick test by highlighting a single character (I tried both the first letter and a middle letter) and applying bold in Robohelp. The text is not found in search. So it looks like any html code breaking up a word means the word is not included in the search term database.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Feb 26, 2018 0
Adobe Community Professional ,
Feb 27, 2018

Copy link to clipboard

Copied

Maybe I have some better news for how to clean the code. Later I recalled the clear formatting feature.

Open any topic that has this code and click the icon shown on the Edit ribbon. It removes all inline formatting.

rh_clear_formatting.png


See www.grainge.org for free RoboHelp and Authoring information.

@petergrainge

www.grainge.org for free RoboHelp & Authoring info. Use the blue Reply button at the top to help me help you.
The black Reply link nests replies and they sort out of order.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Feb 27, 2018 0