• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Understanding SOLR behavior Pt2

New Here ,
Sep 17, 2019 Sep 17, 2019

Copy link to clipboard

Copied

This concerns something I started in forums https://forums.adobe.com/message/11250048#11250048 which I can no longer reply to or add etc. It looks like forums have been replaced with community - is that correct?

 

Anyway onto the subject at hand and following on from the forum thread. I have done some more testing on this issue and have found some differences in the way SOLR behaves on CF 2018 Standard v Enterprise. What I'm looking for from anyone who can help is some clarity over this issue as I currenty undertand it. 

 

I think Charlie Arehart might have been on the money when talking about possible differences between SOLR on Standard and Enterprise as the key to understanding this issue although I would dearly love to access some documentation to clarify all of this of course.

 

Like the hosting provider I ran the query directly in SOLR on my development machine. Unlike the hosting provider’s machine it returned an additional column in the search results called “contents” (ah ah). 

 

This first image is the query run on dev.

 

solr_query_dev.png

 

This next image is the query run on prod (note the absence of a "contents" column).

 

solr_query_prod.png

 

Despite its absence in search results on prod it does exist in teh schema but what I'm seeing is that the contents column on prod is not indexed, tokenized or stored whereas on dev it is. (reading the fourm post linked above will clarify this).

 

The next image is the config for "contents" on my dev machine.

 

solr_contents_column_config_dev.png

 

And the next one is the config for "contents" on prod. Note teh difference in teh state of "Index".

 

solr_contents_column_config_prod.png

 

I think it’s safe to assume that when CFSEARCH retrieves context its doing it from the stored "contents" - that makes sense and also explains why you can vary the number of context passages in a search. This also explains why the system (prod) was unresponsive to programmatic requests for a change in context length.

 

A bit academic but I also tried doing a search against a SOLR collection on a legacy site with the provider (CF 10 Standard) - no surprise - same outcome.

 

Any effort to change the length of the stored "contents" (and therefore context) was moot because there's nothing to change the length of. I do know from my research that making those changes in length does increase the size of the index and I suspect that this is one way Adobe throttles Standard v Enterprise. That makes sense but again it would be nice if you could find some documentation on this.

 

The other constraint is this - somewhere something is making "summary = context" so I'm not sure making other changes in SOLR Admin will get around this - it’s also a way of throttling the server I suspect.

 

The question now is can or should SOLR on CF 2018 Standard be tweaked to include indexed/tokenized/stored contents. I suppose given that Adobe appear to be doing this to throttle CF Standard then we might expect some risk in changing it if that is even possible.

 

Your comments would be greatly appreciated. Thank you.

Views

114

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
no replies

Have something to add?

Join the conversation
Resources
Documentation