• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Solr Collections Architecture Question

Participant ,
Feb 17, 2024 Feb 17, 2024

Copy link to clipboard

Copied

At this point, I may be the only person on the planet who would need this advice But here goes.

As stated in previous threads we're upgrading from CF 4.51 to CF 2023.

In CF 4.51 we had 3 collections for applicant data . Each of them contained data from both a database field and a file for each candidate.  Note:  We had 3 to shorten the search processing time. The collection1 had candidates with last names starting with a-g, collection2 with h-o & thirdly p-z.  That way, we only needed to search approx. 20,000 records vs. 60,000 in any given search after a quick code check of the 1st letter of the last name.

We did this by converting all .doc & .docx resumes to .txt with a custom tag we had developed which converted word docs to txt (doc2txt).  the converted .txt file was then indexed (HTML, HTM & txt resume files were simply loaded into the collection as .txt without any need of conversion.

Now, for each candidate we would also append the txt from a database field in sql/server to that candidates data in the collection.

Thus, in the end we had a collection record containing data from both a file document & a db field.

 

Well, the the custom tag which was created from a C++ .dll file will not run in a Win 2016 environment..

In addition, we also want to start including a 6th type of resume document, namely .pdf files.  (Which is one of the reasons we're dong the upgrade).

 

So I am thinking that we'll create a 2nd collecion called "Notes" for the database field & then configure the code so that it will search both the "Resumes" & "Notes" collection for a give CFSEARH query & if it finds the data in either collection, we'll add that candidate the results/hit list.

 

We may need to do this we three separate sets of collections, once again.  Depending upon the speed of the searches in a Solr environment.  But that is a question for another day.

 

As I understand it, you cannot create a single collection that would contain data from both a file in a directory and a datafield from sql/server.

 

Perhaps their is some "magic" in CF 2023 of which I am not aware which would be a better solution than the one I have in mind.  If so, I would certainly entain it.  Thanks folks!

Alex Craig, General Manager
"Avid Saltwater Fly Fisherman"
TOPICS
Advanced techniques , Builder

Views

89

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Feb 18, 2024 Feb 18, 2024

Lots to consider there, Alex. So first, let's clarify that you don't need that custom tag to have cfindex pull in word or pdf docs. Both are supported and their text indexed. See any of many examples online of using the type="path" option of cfindex, with extensions=".pdf,..doc,.docx", for example.

 

Then, as for storing db data with file content, well, it is possible. Solr (and cfindex and cfsearch) support adding custom fields to a collection (while indexing files).

 

And while that would be h

...

Votes

Translate

Translate
Community Expert ,
Feb 18, 2024 Feb 18, 2024

Copy link to clipboard

Copied

Lots to consider there, Alex. So first, let's clarify that you don't need that custom tag to have cfindex pull in word or pdf docs. Both are supported and their text indexed. See any of many examples online of using the type="path" option of cfindex, with extensions=".pdf,..doc,.docx", for example.

 

Then, as for storing db data with file content, well, it is possible. Solr (and cfindex and cfsearch) support adding custom fields to a collection (while indexing files).

 

And while that would be hard with cfindex type="path" (since that indexes all files at once), you could use type="file"--assuming you'd look in the db for info on the person, because you have stored their corresponding filename there. Then you'd have the file name to index and the db cols to store with that solr document. 

 

Then again, you could also leave the db data in the db, and instead do a query against that for the db data, then do a cfsearch for the file content data, and then join those two results with cf's query of queries capability, which might seem magic to some. 🙂 And especially for you coming from cf 4.5. I can't recall when qofq came out.

 

But someone with more solr experience may have other thoughts. BTW, it would be in your interest to learn more about solr using resources outside of cf. It's incredibly rich, and as I've said in other threads here, cf exposes only a subset of it. 


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Feb 19, 2024 Feb 19, 2024

Copy link to clipboard

Copied

LATEST

Double wow!!  Apparently, there has been an abundance of  CF "Magic" which has been added sincd CF 4.51.

 

Out of the shute, I'm liking your qofq option.  Seems like it might be an ideal solution to the problem.

As you suggested, I'll do some more research.

 

I genuinely appreciate you taking the time to relay your thoughts on this topic.  Thank you very, very much!

 

PS  As you may have seen in another thread, it now appears we can easily live without breaking up the "Resumes" collection.  Reponse times on the searches (even with the collection being triple the size) are now better than we have in CF 4.51.

Alex Craig, General Manager
"Avid Saltwater Fly Fisherman"

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation