RH 2020 Responsive HTML5: "Enable substring search" setting ignored

Engaged ,
Nov 17, 2020 Nov 17, 2020

Copy link to clipboard

Copied

Running RH 2020.1, generating Resonsive HTML5 with Azure_Blue.

 

In my output settings, Enable substring search is disabled (i.e., NOT selected).

 

  • My project has topics that contain text instances of tc and tcs. These are meaningul text strings for our audience.
  • If I search on tc (no quotes, Include all words in search NOT selected), my search results include topics that have instances of tcs - indeed, it's even displayed and bolded (all 3 chars.) in the results preview. 
  • If I search on "tc" (quotes, Include all words in search NOT selected), my search results still include topics that have instances of tcs - however, only the first 2 chars are bold; interestingly, preview inserts a space between the tc and s.
  • If I search on "tc" (quotes, Include all words in search IS selected), my search results are IDENTICAL to the search performed immediately above this one: still include topics that have instances of tcs - however, only the first 2 chars are bold; interestingly, preview inserts a space between the tc and s.

 

Please advise. 

TOPICS
New UI, Output presets

Views

146

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 17, 2020 Nov 17, 2020

Copy link to clipboard

Copied

I have flagged this thread with Adobe so hopefully someone will come along with some answers. These are not questions that forum users can answer.

 

Please use the blue Reply button at the top to help me help you. The black Reply link nests replies and they sort out of order.

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 17, 2020 Nov 17, 2020

Copy link to clipboard

Copied

They will be looking at another thread so you might want to add a link there to this thread.

 

Please use the blue Reply button at the top to help me help you. The black Reply link nests replies and they sort out of order.

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Nov 17, 2020 Nov 17, 2020

Copy link to clipboard

Copied

Thank you!

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 12, 2021 Jan 12, 2021

Copy link to clipboard

Copied

Circling back on this. I've since upgraded to 2020.3 and it's still an issue - even with "Auto correct search query" and "Enable substring search" both disabled. Here's an example - note how "tc" is bold in the search preview/context of the result (btw, this topic doesn't have an instance of "tcs" in it - anywhere), even though "tcs" was the search term.

2021-01-12_22-25-26.jpg

 

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Jan 13, 2021 Jan 13, 2021

Copy link to clipboard

Copied

I have flagged this thread with Adobe and they will be responding.

________________________________________________________
See www.grainge.org for free Authoring and RoboHelp Information

Please use the blue Reply button at the top to help me help you. The black Reply link nests replies and they sort out of order.

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jan 13, 2021 Jan 13, 2021

Copy link to clipboard

Copied

Hi,

This is happenning due to a preprocessing step(stemming) which is applied on topics. In this step we convert all the words to it's root word before indexing them. For example, Development, Developing and Develop  all have same root word, which is Develop, so instead of indexing all the three words separately, we index only one word. So when you search using any of the above words, you will get results for all three words.

Same thing is happening with tc and tcs where both are converted to tc. So when you search using any of these two words, you always get the results for both words. Currently there is no work around for this issue.

In future we will evaluate if certain words can be excluded from this preprocessing steps which can be used in scenarios like this.

 

For other issue which you mentioned for tcs being broken into tc and s, please try this with update 3 and let us know if you still see this issue.

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 14, 2021 Jan 14, 2021

Copy link to clipboard

Copied

Thank you for the explanation. I understand. It is a little confusing considering that output presets include a substring search option.

 

I would expect/hope - as my users have reported that they do - that when "Enable substring search" is disabled, a search would only return identical matches. So for example, if I search on "developing", "development" and "develop" are ignored.

 

"tc" and "tcs" are two very different things in our technical documentation. Returning results for both is creating a high level of frustration for our end users (not to mention, "tcs2" is similar but different from "tcs" for our knowledge base). More specifically, in this particular instance, the issue is that users are searching on "tcs" and getting back results with "tc" - and what makes it worse, is that the topics with "tc" are weighted/ranked higher than the "tcs" topics. 

 

I tried update 3 - no difference in result.

 

Maybe this could be handled in the code so that when an output end user encloses a search term in quotes, the stemming process for that term is disabled. Then only identical matches for that term would be returned. Ideally, this could be supported for various search permutations: e.g., multi-term searches (including a mix of stemmed and non-stemmed terms), when "Include all search terms" is selected, etc.

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Jan 14, 2021 Jan 14, 2021

Copy link to clipboard

Copied

I'd echo @RoboFan - I get that stemming makes sense, but when you've explicitly said to ignore any string match that's a sub-string of what you're searching for, that just looks like a logic bug. In your example of Development, Developing and Develop, I would expect it to index just Develop, but when that flag is off, the "matching" logic should check what the search term is against the match that contains Develop to ensure that it doesn't show Develop when asked for Development. IIRC, this used to work in prior versions of RH.

Likes

translate

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
RoboHelp Documentation