Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

What Do You Use in Place of VSpider When Using Solr?

New Here ,
Oct 06, 2010 Oct 06, 2010

Since Verity is deprecated according to CF Documentation, what crawler do you use if you want to index dynamic pages (like vSpider would)?  Can you use Solr with vSpider or is there something better out there or bundled with CF9?

2.1K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 06, 2010 Oct 06, 2010

You were probably the only person using Vspider! Most Verity users just used CFINDEX.

Solr doesn't come with a crawler. There are plenty of crawlers that can work with Solr, though - just Google "solr crawler".

Dave Watts, CTO, Fig Leaf Software

http://www.figleaf.com/

http://training.figleaf.com/

Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on

GSA Schedule, and provides the highest caliber vendor-authorized

instruction at our training centers, online, or onsite.

Dave Watts, Eidolon LLC
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 07, 2010 Oct 07, 2010

You were probably the only person using Vspider! Most Verity users just used CFINDEX.

The problem with using <cfindex> is that the location of the document in the site's filesystem hierarchy and the infrastructure elements such nav (and even header and footer) are significant in evaluating the context of the document for the purposes of weighting it.

Using a spider is a better approach to getting an accurate, weighted index of the website, rather than just its component data.

--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 07, 2010 Oct 07, 2010

Oh, I agree! That's one reason why I use Google Search Appliance instead of Verity or Solr.

Dave Watts, CTO, Fig Leaf Software

http://www.figleaf.com/

http://training.figleaf.com/

Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on

GSA Schedule, and provides the highest caliber vendor-authorized

instruction at our training centers, online, or onsite.

Dave Watts, Eidolon LLC
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Feb 22, 2011 Feb 22, 2011

To contiue this thread, I'm looking for a replacement for vspider too. That's funny that I'm only one of two people who used vspider!!!

I'm migrating my collections to Solr and I need to use a crawler to index my sites. vspider worked really well because it was simple to setup a recurring job to update a verity indedx each night.

I built a CFC that crawls a site, which I might use to build Solr indexes, but the problem is that it is subject to the server timing out because a site might take a while to crawl, let alone index. I get around it by using <cfsetting requesttimeout="some ungodly number">, but it's still possible that the timeout value is not long enough and the request to index a site will timeout before the index is finished.

Crawling a site and building an index seems like a lot of work for a single request and I wonder what this will do to the JVM. I'm guessing it will spike and CF will be very slow.

It seems like crawling and indexing a site should be done outside of CF, and since Solr is built on Java, maybe the indexing should be done in Java?

Any ideas?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 23, 2011 Feb 23, 2011

I googled "solr spider website" and very quickly found this: http://nutch.apache.org/about.html.

Have you looked @ it?  I'd heard of it, but have never used it.

--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Feb 23, 2011 Feb 23, 2011

Yea, I looked into Nutch it but it seems like a lot of work to figure out. The documentation is horrible and I haven't found any information on how to make it work with ColdFusion.

What I did find, however, is SearchBlox, which looks promising. I very quickly installed it (on Windows) and got it to crawl and index one of my sites. I can then do searches from directly within the web admin console (everything I've done so far works from within the admin console). Next, I'll figure out how to make calls into it from ColdFusion to conduct a search (which I think will be trivial). You can get XML back from a search, which will make it easy. If this works, I will have found a great solution completely independent from ColdFusion's Solr integration.

It's interesting to me that there's such little information out there on what ColdFusion developers are using to crawl/index their sites. Isn't anyone doing this???

-JP

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 23, 2011 Feb 23, 2011
LATEST

It's interesting to me that there's such little information out there on what ColdFusion developers are using to crawl/index their sites. Isn't anyone doing this???

I reckon most people won't've been using Verity from the outset because it's a bit rubbish (or CF's implementation of it is), so they'll already have something else implemented for doing search, so that CF9 now offers something else instead of Verity is neither here nor there.

Or others will be using Verity, and it works for them, so that CF9 now offers something else instead of Verity is neither here nor there.

Or that they'll be using a proper hardware search appliance, so that CF9 now offers something else instead of Verity is neither here nor there.

Or that they don't bother to actually spider their site, instead they just use a CFINDEX with just the docs or data straight from the DB.  In which case they're just using Verity or they've decided to try Solr and Solr does that side of things OK.

--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources