Skip to main content
Participant
August 6, 2011
Question

Searching for single words in Solr

  • August 6, 2011
  • 1 reply
  • 604 views

I have a Win2k8 Standard 64 bit install of CF9.0.1.  I have simple PDF document containing two words, "Seattle" and "Seahawks".  If I search for "Seattle", I get 0 results.  If I search for "Seattle Seahawks", I get the one result I expected. 

What can I do to add better support for single word searches?

NOTE: This does also occur with .doc and .txt files.

Thanks,
Merritt Chapman

    This topic has been closed for replies.

    1 reply

    Known Participant
    August 9, 2011

    Seattle should give you a hit.

    Default query mode in Solr distributed with coldfuion is OR. (it can be changed in solrconfig)

    I suspect the actual search query is Seattle OR Seahawks

    -do you still get one hit searching for Seattle AND Seahawks ?

    So for some reason Seattle has not been put into the index.

    It can happen if its in the stop word list for the collection (but it should not) or if the synonyms file is  badly configured.

    I would analyze how Solr indexes these words (http://localhost:8983/solr/[your collection]/admin/analysis.jsp

    Select the fieldname where you store the data [summary ?]

    check verbose output

    and type  Seattle Seahawks in Field value

    Check how Solr applies filters etc

    Participant
    August 9, 2011

    Thanks for responding and the tip on analyzing indexes.  To answer your questions:

    1. Searching for "Seattle" = 0 hits.

    2. Searching for "Seattle AND Seahawks" = 1 hit.

    3. Searching for "Seattle OR Seahawkds" = 1 hit.

    Stop words and synonyms are set to defaults.

    I'm going to look into that and see what's acutally going on.  I just got a couple of books on Solr yesterday, I've got some reading to do first.

    Known Participant
    August 10, 2011

    This is most peculiar, I did index the same string and it works perfectly, but I have extensively modified the Solr schemas etc.

    I would try indexing a plain text file with these words and see if the problem is within Solr or if it’s the CFINDEX messing this up.

    (or maybe an error in the PDF)

    Using Apache Tika or a commercial filter might be an option if ColdFusion messes with things.

    Tika is well documented compared to CFINDEX.

    I found CFINDEX very limiting (regarding access to my own fields defined in Solr)

    so I wrote my own function, posting XML data directly to Solr. This only works with text content though.

    If you are using the Solr GUI on local host you may also query the collection directly and watch what Solr returns in RAW XML.

    Try to see if there are any special characters added by Acrobat, tailing the word Seattle.