Skip to main content
AlexCraig
Inspiring
February 27, 2024
Question

Solr Admin Results Are Always Correct And CFSEARCH Results Only Work On Single Keyword Searches

  • February 27, 2024
  • 1 reply
  • 590 views

I'm starting a new thread on this because I discovered my first approach would not work for all possible Solr syntaxes; some of which we difinitely need in our application.

 

Basically, we are now taking a variable value from a forms code as shown below:

<INPUT TYPE="hidden" NAME="keywords" VALUE="#Form.keywords#">

 

Then we are populating CFSEARCH with the syntax for a given query by using encodeForURL & decodeFromURL where appropriate in the application. 

So our CFSEARCH code is:  Note:  We have included an <cfoutput><cfabort> line of code for testing.

******************************************

<CFSEARCH NAME="applicants_#var#"
COLLECTION="Resumes"
TYPE="standard"
CRITERIA="#encodeForURL( Form.keywords)#">

<cfoutput>#applicants_test.recordcount#</cfoutput><cfabort>

******************************************

The output shows we are using the solr syntax needed for a given search.

However, the only time where we get the correct number of hits is when we use a single keyword in the search.  More complex searches involving mutlitple keywords and/or special characters give us ridiculously high results.

 

Obviously, I am missing something that is key to producing the accurate results that the Solr Admin tool query tool always provides.

 

To be more precise I have discovered that the use of any quotes beyond the quotes that are required at the beginning & end of a given syntax or spaces or special characters such as a ? * "~ + or - character will give us high results or cause an error for the search.

 

For example a search on "unix" AND "linux" renders 3685 hits using the Solr Admin query tool & on our CF 4.51 production server running Verity searches.  So we know the results are accurate.

However, using the CF 2023 CFSEARCH code described above we get 43586 hits!  About 80% of our collection!

 

Frankly, at this point I am dead in the water.  Can someone please tell me where my problem is and how to overcome it.  Thank you very much in advance.

This topic has been closed for replies.

1 reply

AlexCraig
AlexCraigAuthor
Inspiring
February 29, 2024

Edit 2 Found some Q data that my be helpful.

 

OK. The Q criteria value of the http link generated by the Solr Admin Query tool is:

q=%22tribology%22%20AND%20%22friction%22

Makes sense as that is the ASCII equivalent.

The Q encodeForURL value of the CFSearch criteria output is:

%2522tribology%2522%2BAND%2B%2522friction%2522

 

Doing some research I discovered the %2522 indicated double quotes and the %2B is + sign.

Well, now why is the same alphanumeric string "tribology" AND "friction" producing different outputs? And more importantly, how do I get CFSearch to generate the ASCII equivalent?

Alex Craig, General Manager&quot;Avid Saltwater Fly Fisherman&quot;
Charlie Arehart
Community Expert
Community Expert
February 29, 2024

I don't think the problem is in cfsearch, but rather your input to it. As I've said before, if you output (to screen, for debugging) what that encodeForURL would produc

 (and use htmleditformat to "see" the values without browser rendering), you may see that's where the problem starts.

 

I've been wondering since you posted this earlier today why you use the encodeForURL in the first place? That wasn't there in your cf 4.5 code (as it's new since then). If it's "for security", note that there are other means to validate the input for xss, etc., without this means which can CHANGE the input. 

/Charlie (troubleshooter, carehart. org)
AlexCraig
AlexCraigAuthor
Inspiring
February 29, 2024

>I've been wondering since you posted this earlier today why you use the encodeForURL in the first place?

I thought it would give me a readout on what the actual query string that the Solr Admin Query used.

Use htmleditformat to display the output in the following 3 cases as you suggested rendered shows me I was incorrect.

 

For the criteria using encodeFor URL variable:  %2522tribology%2522%2BAND%2B%2522friction%2522

For criteria using the original variable code #Form.keywords#:  %22tribology%22+AND+%22friction%22

Solr Admin Query rendered output for the query: %22tribology%22%20AND%20%22friction%22

 

So for whatever reason the variable produced by the form entry in the original code is inserting plus signs instead of the spaces that the Solr requires for a valid return result.

 

If my analysis holds water, of course the next question is how do I get the form entry variable to produce the "correct" string for all valid solr query types?  I could use replace( to convert the plus signs to spaces for this query.  But other solr  queries need the plus signs.  And there are other specialty characters that come into play that would make using replace( code a nightmare that would probably be unworkable for all query types.

Edit:  Now that I think about it encodeForURL was useful in that its does show that there are double quotes around each word.  I"m betting the form field automatically inserts them; and the + characters for this type of query configuration.  One thing I'm sure of, they are not created out of thin air!

Alex Craig, General Manager&quot;Avid Saltwater Fly Fisherman&quot;