Running RH 2020.1, generating Resonsive HTML5 with Azure_Blue.
In my output settings, Enable substring search is disabled (i.e., NOT selected).
I have flagged this thread with Adobe so hopefully someone will come along with some answers. These are not questions that forum users can answer.
They will be looking at another thread so you might want to add a link there to this thread.
Circling back on this. I've since upgraded to 2020.3 and it's still an issue - even with "Auto correct search query" and "Enable substring search" both disabled. Here's an example - note how "tc" is bold in the search preview/context of the result (btw, this topic doesn't have an instance of "tcs" in it - anywhere), even though "tcs" was the search term.
I have flagged this thread with Adobe and they will be responding.
See www.grainge.org for free Authoring and RoboHelp Information
This is happenning due to a preprocessing step(stemming) which is applied on topics. In this step we convert all the words to it's root word before indexing them. For example, Development, Developing and Develop all have same root word, which is Develop, so instead of indexing all the three words separately, we index only one word. So when you search using any of the above words, you will get results for all three words.
Same thing is happening with tc and tcs where both are converted to tc. So when you search using any of these two words, you always get the results for both words. Currently there is no work around for this issue.
In future we will evaluate if certain words can be excluded from this preprocessing steps which can be used in scenarios like this.
For other issue which you mentioned for tcs being broken into tc and s, please try this with update 3 and let us know if you still see this issue.
Thank you for the explanation. I understand. It is a little confusing considering that output presets include a substring search option.
I would expect/hope - as my users have reported that they do - that when "Enable substring search" is disabled, a search would only return identical matches. So for example, if I search on "developing", "development" and "develop" are ignored.
"tc" and "tcs" are two very different things in our technical documentation. Returning results for both is creating a high level of frustration for our end users (not to mention, "tcs2" is similar but different from "tcs" for our knowledge base). More specifically, in this particular instance, the issue is that users are searching on "tcs" and getting back results with "tc" - and what makes it worse, is that the topics with "tc" are weighted/ranked higher than the "tcs" topics.
I tried update 3 - no difference in result.
Maybe this could be handled in the code so that when an output end user encloses a search term in quotes, the stemming process for that term is disabled. Then only identical matches for that term would be returned. Ideally, this could be supported for various search permutations: e.g., multi-term searches (including a mix of stemmed and non-stemmed terms), when "Include all search terms" is selected, etc.
I'd echo @RoboFan - I get that stemming makes sense, but when you've explicitly said to ignore any string match that's a sub-string of what you're searching for, that just looks like a logic bug. In your example of Development, Developing and Develop, I would expect it to index just Develop, but when that flag is off, the "matching" logic should check what the search term is against the match that contains Develop to ensure that it doesn't show Develop when asked for Development. IIRC, this used to work in prior versions of RH.