Running RH 2020.1, generating Resonsive HTML5 with Azure_Blue. In my output settings, Enable substring search is disabled (i.e., NOT selected). My project has topics that contain text instances of tc and tcs. These are meaningul text strings for our audience.If I search on tc (no quotes, Include all words in search NOT selected), my search results include topics that have instances of tcs - indeed, it's even displayed and bolded (all 3 chars.) in the results preview. If I search on "tc" (quotes, Include all words in search NOT selected), my search results still include topics that have instances of tcs - however, only the first 2 chars are bold; interestingly, preview inserts a space between the tc and s.If I search on "tc" (quotes, Include all words in search IS selected), my search results are IDENTICAL to the search performed immediately above this one: still include topics that have instances of tcs - however, only the first 2 chars are bold; interestingly, preview inserts a space between the tc and s. Please advise.

R

RoboFan

Inspiring

Question

RH 2020 Responsive HTML5: "Enable substring search" setting ignored

Forum|Forum|5 years ago
November 17, 2020
9 replies
885 views

Running RH 2020.1, generating Resonsive HTML5 with Azure_Blue.

In my output settings, Enable substring search is disabled (i.e., NOT selected).

My project has topics that contain text instances of tc and tcs. These are meaningul text strings for our audience.
If I search on tc (no quotes, Include all words in search NOT selected), my search results include topics that have instances of tcs - indeed, it's even displayed and bolded (all 3 chars.) in the results preview.
If I search on "tc" (quotes, Include all words in search NOT selected), my search results still include topics that have instances of tcs - however, only the first 2 chars are bold; interestingly, preview inserts a space between the tc and s.
If I search on "tc" (quotes, Include all words in search IS selected), my search results are IDENTICAL to the search performed immediately above this one: still include topics that have instances of tcs - however, only the first 2 chars are bold; interestingly, preview inserts a space between the tc and s.

Please advise.

Peter Grainge

Community Expert

5 January 2025. An update for anyone finding this thread.

In 2022.5 I created a new project with just two topics. I entered pos in one and positive in the other.

I found that with Enable Substring Search selected, a search for pos (no quotes) found both topics as it should. With the option deselected, the search only found one topic.

Somewhere along the line, the issue seems to have been fixed.

________________________________________________________

My site www.grainge.org includes many free Authoring and RoboHelp resources that may be of help.

Use the menu (bottom right) to mark the Best Answer or Highlight particularly useful replies. Found the answer elsewhere? Share it here.

C

christele15

Inspiring

@vikchandI am glad that a fix is available in this new update which resolves one of my concerns. However, I am encountering addtional issues such as randomly search not returning any results which clients have escalated due to the fact that our online documentation is based on specific code and description. I am using RH 2022.47 and HTML 5 output. I have isolated being a content based issue... For example 28088 returns results but not 14040 which are both in the same topic?! I had to disable the "enable substring" which was not functionning as expected but was also causing the search to freeze.
Thank you

Christele

Jeff_Coatsworth

Community Expert

I'd echo @RoboFan - I get that stemming makes sense, but when you've explicitly said to ignore any string match that's a sub-string of what you're searching for, that just looks like a logic bug. In your example of Development, Developing and Develop, I would expect it to index just Develop, but when that flag is off, the "matching" logic should check what the search term is against the match that contains Develop to ensure that it doesn't show Develop when asked for Development. IIRC, this used to work in prior versions of RH.

R

RoboFanAuthor

Inspiring

Thank you for the explanation. I understand. It is a little confusing considering that output presets include a substring search option.

I would expect/hope - as my users have reported that they do - that when "Enable substring search" is disabled, a search would only return identical matches. So for example, if I search on "developing", "development" and "develop" are ignored.

"tc" and "tcs" are two very different things in our technical documentation. Returning results for both is creating a high level of frustration for our end users (not to mention, "tcs2" is similar but different from "tcs" for our knowledge base). More specifically, in this particular instance, the issue is that users are searching on "tcs" and getting back results with "tc" - and what makes it worse, is that the topics with "tc" are weighted/ranked higher than the "tcs" topics.

I tried update 3 - no difference in result.

Maybe this could be handled in the code so that when an output end user encloses a search term in quotes, the stemming process for that term is disabled. Then only identical matches for that term would be returned. Ideally, this could be supported for various search permutations: e.g., multi-term searches (including a mix of stemmed and non-stemmed terms), when "Include all search terms" is selected, etc.

V

vikchand

Adobe Employee

Hi,

This is happenning due to a preprocessing step(stemming) which is applied on topics. In this step we convert all the words to it's root word before indexing them. For example, Development, Developing and Develop all have same root word, which is Develop, so instead of indexing all the three words separately, we index only one word. So when you search using any of the above words, you will get results for all three words.

Same thing is happening with tc and tcs where both are converted to tc. So when you search using any of these two words, you always get the results for both words. Currently there is no work around for this issue.

In future we will evaluate if certain words can be excluded from this preprocessing steps which can be used in scenarios like this.

For other issue which you mentioned for tcs being broken into tc and s, please try this with update 3 and let us know if you still see this issue.

Peter Grainge

Community Expert

I have flagged this thread with Adobe and they will be responding.

________________________________________________________
See www.grainge.org for free Authoring and RoboHelp Information

Use the menu (bottom right) to mark the Best Answer or Highlight particularly useful replies. Found the answer elsewhere? Share it here.

R

RoboFanAuthor

Inspiring

Circling back on this. I've since upgraded to 2020.3 and it's still an issue - even with "Auto correct search query" and "Enable substring search" both disabled. Here's an example - note how "tc" is bold in the search preview/context of the result (btw, this topic doesn't have an instance of "tcs" in it - anywhere), even though "tcs" was the search term.