Skip to main content
Inspiring
February 17, 2010
Question

SOLR and PDF Question

  • February 17, 2010
  • 1 reply
  • 430 views

Does anybody have any advice about using SOLR to index PDF documents?  The documents have metadata added to them, and we'd like to associate the metadata to the Custom1 through Custom4, and then return them in a results page.  We're able to index the documents, but can't seem to get SOLR (or Verity, for that matter) to recognize and return any of the metadata.

Thanks...

   Dan

This topic has been closed for replies.

1 reply

Inspiring
February 21, 2010

I suspect you're gonna have to extract whichever metadata you want to use manually (using <cfpdf> to read the PDF files).

It might be worth checking out what work has already been done for getting Lucene to extract & parse info from PDFs.  Even if the stuff out there isn't an exact fit for you, it'll almost certainly be open source so you'll be able to modify it to suit your needs.

--

Adam