Skip to main content
July 28, 2011
Question

narrowing search collection in html

  • July 28, 2011
  • 1 reply
  • 417 views

I'm looking at setting up a website search in Solr (or Verity), and want to point it to a collection of html files.  Is there a way to setup a collection to include only content within specific div(s)?  I don't want it to include the header, sidebars, footer, etc.  Thanks.

This topic has been closed for replies.

1 reply

Inspiring
August 13, 2011

With the tools that CF provides, I think you're gonna have to extract this info yourself, stick it in a query, and do a CUSTOM index job.

You can probably do it directly with Lucene, but "how" is probably a question best asked on a Lucene forum (having first read through the docs ;-).

That said, all the rest of the bumpf on the page does add context to the document, so you might want to question whether it's necessary or even desirable to omit it from the indexing process.

--

Adam