RoboHelp 2020, Responsive HTML5 output.
I understand these aren't specific (e.g., Chrome, Edge, Firefox, let alone versions), so I'm looking for more general answers or insight on this. Thoughts?
I think you'll have to go to Adobe for the answers to most of those questions.
I feel like the algorithm itself has changed, but there's no public documentation on it that I'm aware of. Also I'm not sure if the ranking you list still applies, and again that's going to require Adobe to comment.
We're publishing a combined/merged help (1 parent, 11 children, 6.4k total topics). Upgraded it from 2017 to 2020 and staff are complaining now (since 2020 output) that search is slower and results are now different. I have noticed that search preview text is initially slow to load, but it does seem to improve the longer I've used the system.
I know that sometimes after we update a project, it helps to clear your browser history to get the new version of the topic.
I also know that RoboHelp's auto-text function uses/stores the history.
So based on that, I'm just wondering if browser history or cookies creates and stores any kind of local (to each user) search index (similar to database indexing) to access data quicker/more efficiently.
I'm still trying to collect more specific info from staff, but one particular user casually mentioned he was using Incognito mode (not even sure at this point which browser, but only Chrome and Chromium Edge are allowed). I know this mode has caused other issues with our output in the past.
Another problem is that as I've addressed various issues, I've had to compile and push out the entire merged output to our staff (completely replacing what's out there with the new output). So, if there's some kind of search index database that's created and managed, perhaps that's getting reset each time and why we're still seeing performance issues.
The files that intrigue me are in .../mergedProjects/<project root>/whxdata/
Oh, one thing that isn't new but may affect it is server caching. Often js and xml files are set to cache for quite a long time on the server, so you could check with your web server admin what the settings are and try a shorter period.
"It" being "incorrect search results" after publishing a new version of the help.
One other note, seemingly related to browser history - the first time I run a specific search, the results list quickly populates with topic titles and breadcrumbs/URLs, but then takes a substantial amount of time 10s or more, often, to populate/refresh the list with the topic previews (currently set to 150 char.). If I run a different search and then come back to the original search, the previews load relatively quickly. Another point on browser history having some impact - open the output in Chrome and Edge (Chromium). Run a search I already ran in Chrome, in Edge. The preview takes a long time to load. Run it again in Edge and it loads much quicker.
Previews used to just "grab" the first X number of characters at the beginning of a topic. Now, the method is indeterminate - it appears to be some random instance of text that includes a/the search term (certainly not the first instance). I can imagine it would be quicker to grab and display the first X number of characters. Still, performance on the new method is so slow, it's almost unusable.
What impact, if any, does incognito mode have on search for an end user (of Responsive HTML5 output)?
In incognito mode, since browsers don't use any cached files (from non-incognito mode), at first search might be little slower, but it will become faster for subsequent search results, since it will start using cached files. You will experience same behaviour if you deleted all your browsing history and cached data from your browser.
What impact, if any, does an end user's browser history (retention, accumulation, erasure) have on search? What happens if a user deletes their browsing history?
Answer to question no. 1 addressess this too.
What impact, if any, does a user's cookie storage have on search? What happens if a user deletes their cookies?
Cookies do not have any effect on search results, as they are not used in search algorithm.
What impact, if any, does updating output (changes to existing topics, adding new topics, etc.) have on an end user's search result experience?
Updating content to existing topics or adding new topics might slowdown search results, since updating files will invalidate browsers cached data and it will have to download fresh files. Also if your server is not configured correctly to invalidate updated files, users might see old or incorrect results.
Also, how does the search function differ (if at all) for Responsive HTML5 output moving from RoboHelp 2017 to 2020? I believe results used to be (RH 2017) returned, top to bottom, based on how the search term is used in each topic and the frequency with which it appears according to the following priority/hierarchy:
Search algorithm has changed since 2017, so you will see different results for the same project in both versions. Now apart from taking term location (e.g. title, heading, keywords, etc.) into account, we also consider their frequency, length of the topic, number of topics, etc. Also in case the search query contains more than one term, we also take into account their closeness in the topics, so a topic in which search terms appear very close will be ranked higher than other topics. For more technical and in-depth details please refer to https://lunrjs.com/ .
I hope this clears your doubts. Do let us know if you have any more queries.