Coldfusion Solr Pdf index custom metadata

Report · Oct 20, 2010

Hi, I've been trying to index some custom metadata fields using Solr... I've manage to do so, but the so-call solution only came with more issues...

1)Since I manage to index the pdf using the HTTP Post CURL in PHP, coldfusion won't read the indexed files, even thou Solr does... is there a way to use a standalone version of Solr and integrate with Coldfusion?

2)Using the Solr server CF already has, is there a way to index the pdf files the same way I did it using php and still be able to call them from coldfusion?

3)Can someone point me to some advance techniques or tutorial refering the use of Solr-Coldfusion... on how to custom index files, querys and results...

I'll appreciate all the help I can get...

Report · Oct 20, 2010

1) Not sure what you mean by 1.

2) Solr already supports indexing PDF files.

Report · Oct 20, 2010

Indeed, but for some reason when you a custom metadata to the PDF, Solr won't index that data, I already created the field in the schema.xml, the only way to do it so is by using the curl script, this allows you to add a custom field to the soon-to-be-index pdf, but by doing so Coldfusion won't read that index...

Report · Oct 20, 2010

Ahhh - custom PDF stuff.

So - I'd do it manually. You can read custom PDF metadata using my

pdfUtils CFC from RIAForge. Use that along with the cfpdf tag to get

what you want from the PDFs and build your collection by hand.

Report · Oct 20, 2010

Thanks for the tip about the pdfUtils... but the thing is... I have over 10,000 files in pdf, and I have to add two custom metadata fields, month and year... the fields have different values, and the purpose of this is so when someone search a document, they can do it so by using filters in this case a filter by month and/or year... That's why I've been trying to index those fields with Solr...

So... how can I add custom field that are link to the already indexed pdf in order to creat search filters?... should I use the DIH? for that I have to get all the content of the pdf and put them in a DB...

Report · Oct 20, 2010

And you still can. Creating your initial collection would be a slow,

one time process, but when done its... well done.

Report · Oct 25, 2010

Again thanks for the pdfutils... I just found out that it's very usefull, but I'm still having the same issue in which the Solr won't index the custom metadata...