Skip to main content
AlexCraig
Inspiring
February 27, 2024
Question

Solr Admin Results Are Always Correct And CFSEARCH Results Only Work On Single Keyword Searches

  • February 27, 2024
  • 1 reply
  • 590 views

I'm starting a new thread on this because I discovered my first approach would not work for all possible Solr syntaxes; some of which we difinitely need in our application.

 

Basically, we are now taking a variable value from a forms code as shown below:

<INPUT TYPE="hidden" NAME="keywords" VALUE="#Form.keywords#">

 

Then we are populating CFSEARCH with the syntax for a given query by using encodeForURL & decodeFromURL where appropriate in the application. 

So our CFSEARCH code is:  Note:  We have included an <cfoutput><cfabort> line of code for testing.

******************************************

<CFSEARCH NAME="applicants_#var#"
COLLECTION="Resumes"
TYPE="standard"
CRITERIA="#encodeForURL( Form.keywords)#">

<cfoutput>#applicants_test.recordcount#</cfoutput><cfabort>

******************************************

The output shows we are using the solr syntax needed for a given search.

However, the only time where we get the correct number of hits is when we use a single keyword in the search.  More complex searches involving mutlitple keywords and/or special characters give us ridiculously high results.

 

Obviously, I am missing something that is key to producing the accurate results that the Solr Admin tool query tool always provides.

 

To be more precise I have discovered that the use of any quotes beyond the quotes that are required at the beginning & end of a given syntax or spaces or special characters such as a ? * "~ + or - character will give us high results or cause an error for the search.

 

For example a search on "unix" AND "linux" renders 3685 hits using the Solr Admin query tool & on our CF 4.51 production server running Verity searches.  So we know the results are accurate.

However, using the CF 2023 CFSEARCH code described above we get 43586 hits!  About 80% of our collection!

 

Frankly, at this point I am dead in the water.  Can someone please tell me where my problem is and how to overcome it.  Thank you very much in advance.

This topic has been closed for replies.

1 reply

AlexCraig
AlexCraigAuthor
Inspiring
February 29, 2024

Edit 2 Found some Q data that my be helpful.

 

OK. The Q criteria value of the http link generated by the Solr Admin Query tool is:

q=%22tribology%22%20AND%20%22friction%22

Makes sense as that is the ASCII equivalent.

The Q encodeForURL value of the CFSearch criteria output is:

%2522tribology%2522%2BAND%2B%2522friction%2522

 

Doing some research I discovered the %2522 indicated double quotes and the %2B is + sign.

Well, now why is the same alphanumeric string "tribology" AND "friction" producing different outputs? And more importantly, how do I get CFSearch to generate the ASCII equivalent?

Alex Craig, General Manager&quot;Avid Saltwater Fly Fisherman&quot;
Charlie Arehart
Community Expert
Community Expert
February 29, 2024

I don't think the problem is in cfsearch, but rather your input to it. As I've said before, if you output (to screen, for debugging) what that encodeForURL would produc

 (and use htmleditformat to "see" the values without browser rendering), you may see that's where the problem starts.

 

I've been wondering since you posted this earlier today why you use the encodeForURL in the first place? That wasn't there in your cf 4.5 code (as it's new since then). If it's "for security", note that there are other means to validate the input for xss, etc., without this means which can CHANGE the input. 

/Charlie (troubleshooter, carehart. org)
Charlie Arehart
Community Expert
Community Expert
February 29, 2024

>I've been wondering since you posted this earlier today why you use the encodeForURL in the first place?

I thought it would give me a readout on what the actual query string that the Solr Admin Query used.

Use htmleditformat to display the output in the following 3 cases as you suggested rendered shows me I was incorrect.

 

For the criteria using encodeFor URL variable:  %2522tribology%2522%2BAND%2B%2522friction%2522

For criteria using the original variable code #Form.keywords#:  %22tribology%22+AND+%22friction%22

Solr Admin Query rendered output for the query: %22tribology%22%20AND%20%22friction%22

 

So for whatever reason the variable produced by the form entry in the original code is inserting plus signs instead of the spaces that the Solr requires for a valid return result.

 

If my analysis holds water, of course the next question is how do I get the form entry variable to produce the "correct" string for all valid solr query types?  I could use replace( to convert the plus signs to spaces for this query.  But other solr  queries need the plus signs.  And there are other specialty characters that come into play that would make using replace( code a nightmare that would probably be unworkable for all query types.

Edit:  Now that I think about it encodeForURL was useful in that its does show that there are double quotes around each word.  I"m betting the form field automatically inserts them; and the + characters for this type of query configuration.  One thing I'm sure of, they are not created out of thin air!


You say you thought encodeforurl would "give you a readout", but again you're using it to affect what's going INTO the cf search, not creating any "readout". That's where I think part of your problem is happening here. 

 

As for the plus signs in the incoming form fields, usually that's your browser turning spaces into them--which should not be happening to form fields. 

 

So first, might it be that your form tag is using method="get" (or no method) instead of "post". And what enctype are you using on it the form tag, if any?  Finally, are you doing any Javascript manipulation of the form submission, which might be causing the plus signs?

 

Or are you doing ANY other manipulation of them before this point that you're using them (or before you're displaying them for debug purposes with htmleditformat)?

 

I'll say again (as I have in other recent threads with you on your cfsearch woes), you would really help yourself and us by creating a simple few-line demo of your form and cfml processing--whether those are in one file or two. The benefit is both that we'd not be left guessing (or presuming) things you've done (or not), and also we could more readily try to replicate what you're seeing.

 

But I realize it could be challenging, and even better would be a real RUNNING demo--which would require a cfcollection and cfindex operation, as well as content to index. But it doesn't have to be your REAL data.

 

If somehow what I've offered here might get you over this hump, I'll understand if you hold off on such demo code...but then it may help if you raise yet another concern. 🙂 Just keep it in mind. 

/Charlie (troubleshooter, carehart. org)