Highlighted

Understanding SOLR behavior Pt2

New Here ,
Sep 17, 2019

Copy link to clipboard

Copied

This concerns something I started in forums https://forums.adobe.com/message/11250048#11250048 which I can no longer reply to or add etc. It looks like forums have been replaced with community - is that correct?

 

Anyway onto the subject at hand and following on from the forum thread. I have done some more testing on this issue and have found some differences in the way SOLR behaves on CF 2018 Standard v Enterprise. What I'm looking for from anyone who can help is some clarity over this issue as I currenty undertand it. 

 

I think Charlie Arehart might have been on the money when talking about possible differences between SOLR on Standard and Enterprise as the key to understanding this issue although I would dearly love to access some documentation to clarify all of this of course.

 

Like the hosting provider I ran the query directly in SOLR on my development machine. Unlike the hosting provider’s machine it returned an additional column in the search results called “contents” (ah ah). 

 

This first image is the query run on dev.

 

solr_query_dev.png

 

This next image is the query run on prod (note the absence of a "contents" column).

 

solr_query_prod.png

 

Despite its absence in search results on prod it does exist in teh schema but what I'm seeing is that the contents column on prod is not indexed, tokenized or stored whereas on dev it is. (reading the fourm post linked above will clarify this).

 

The next image is the config for "contents" on my dev machine.

 

solr_contents_column_config_dev.png

 

And the next one is the config for "contents" on prod. Note teh difference in teh state of "Index".

 

solr_contents_column_config_prod.png

 

I think it’s safe to assume that when CFSEARCH retrieves context its doing it from the stored "contents" - that makes sense and also explains why you can vary the number of context passages in a search. This also explains why the system (prod) was unresponsive to programmatic requests for a change in context length.

 

A bit academic but I also tried doing a search against a SOLR collection on a legacy site with the provider (CF 10 Standard) - no surprise - same outcome.

 

Any effort to change the length of the stored "contents" (and therefore context) was moot because there's nothing to change the length of. I do know from my research that making those changes in length does increase the size of the index and I suspect that this is one way Adobe throttles Standard v Enterprise. That makes sense but again it would be nice if you could find some documentation on this.

 

The other constraint is this - somewhere something is making "summary = context" so I'm not sure making other changes in SOLR Admin will get around this - it’s also a way of throttling the server I suspect.

 

The question now is can or should SOLR on CF 2018 Standard be tweaked to include indexed/tokenized/stored contents. I suppose given that Adobe appear to be doing this to throttle CF Standard then we might expect some risk in changing it if that is even possible.

 

Your comments would be greatly appreciated. Thank you.

Views

37

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Understanding SOLR behavior Pt2

New Here ,
Sep 17, 2019

Copy link to clipboard

Copied

This concerns something I started in forums https://forums.adobe.com/message/11250048#11250048 which I can no longer reply to or add etc. It looks like forums have been replaced with community - is that correct?

 

Anyway onto the subject at hand and following on from the forum thread. I have done some more testing on this issue and have found some differences in the way SOLR behaves on CF 2018 Standard v Enterprise. What I'm looking for from anyone who can help is some clarity over this issue as I currenty undertand it. 

 

I think Charlie Arehart might have been on the money when talking about possible differences between SOLR on Standard and Enterprise as the key to understanding this issue although I would dearly love to access some documentation to clarify all of this of course.

 

Like the hosting provider I ran the query directly in SOLR on my development machine. Unlike the hosting provider’s machine it returned an additional column in the search results called “contents” (ah ah). 

 

This first image is the query run on dev.

 

solr_query_dev.png

 

This next image is the query run on prod (note the absence of a "contents" column).

 

solr_query_prod.png

 

Despite its absence in search results on prod it does exist in teh schema but what I'm seeing is that the contents column on prod is not indexed, tokenized or stored whereas on dev it is. (reading the fourm post linked above will clarify this).

 

The next image is the config for "contents" on my dev machine.

 

solr_contents_column_config_dev.png

 

And the next one is the config for "contents" on prod. Note teh difference in teh state of "Index".

 

solr_contents_column_config_prod.png

 

I think it’s safe to assume that when CFSEARCH retrieves context its doing it from the stored "contents" - that makes sense and also explains why you can vary the number of context passages in a search. This also explains why the system (prod) was unresponsive to programmatic requests for a change in context length.

 

A bit academic but I also tried doing a search against a SOLR collection on a legacy site with the provider (CF 10 Standard) - no surprise - same outcome.

 

Any effort to change the length of the stored "contents" (and therefore context) was moot because there's nothing to change the length of. I do know from my research that making those changes in length does increase the size of the index and I suspect that this is one way Adobe throttles Standard v Enterprise. That makes sense but again it would be nice if you could find some documentation on this.

 

The other constraint is this - somewhere something is making "summary = context" so I'm not sure making other changes in SOLR Admin will get around this - it’s also a way of throttling the server I suspect.

 

The question now is can or should SOLR on CF 2018 Standard be tweaked to include indexed/tokenized/stored contents. I suppose given that Adobe appear to be doing this to throttle CF Standard then we might expect some risk in changing it if that is even possible.

 

Your comments would be greatly appreciated. Thank you.

Views

38

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Sep 17, 2019 0

Have something to add?

Join the conversation