Help with getting solr working

Report · Aug 17, 2020

I am trying to get solr working on a new install of ColdFusion 2018. After spending hours searching the internet and the forums and I still don't know how to do what I need to do and I'm ready to give up. Doing this in Verity was so easy.

I have several directories of XML files that point to specific pages in multiple huge PDF files. All I want to do is to search these XML files and pull back the page number in the PDF document. So, for example, I have a simple XML file like this:

<records file="S_VN50701A.pdf">
<record><apn>002-0-080-215</apn><page>3</page></record>
<record><apn>008-0-110-225</apn><page>5</page></record>
<record><apn>008-0-140-050</apn><page>7</page></record>
<record><apn>008-0-140-105</apn><page>9</page></record>
<record><apn>008-0-140-140</apn><page>11</page></record>
<record><apn>008-0-150-025</apn><page>13</page></record>
</records>

If I search for 008-0-150-025, I would like solr to return "13" for page 13. Or even the whole line and I can parse it out. I can't get solr to do anything even remotely close to this.

I have many of these XML files indexing hundreds of thousands of PDF pages. It was all working in Verity and performed very quickly. In solr, I am at a loss.

Can anyone steer me in the right direction?

Thanks.

Report · Aug 18, 2020

I'm surprised this worked for you in Verity. As far as I understand, it shouldn't have. (Of course, there's tons of stuff I don't know about Verity!)

As I see it, you have an example of one XML file. That file will be indexed as a separate entity, a document all its own. Neither Solr nor Verity, by themselves, should let you find a specific part of that document by searching for a specific term that happens to be near it. If you search for 008-0-150-025, you should find ... this document. So I don't know what you were doing in Verity, but I don't think it's the default way that Verity operates.

OK, so that's bad! But you can make these XML files searchable in either Solr or Verity by letting ColdFusion "preprocess" them, and turn them into query objects, and indexing those query objects. Because these XML files do contain structured data, this should be pretty easy. You'll need to build a relatively simple program to iterate through these files, build one or more query objects out of them using CF's query functions, then index those. The URL you'll want to associate with each row is presumably a link to the PDF in question.

Dave Watts, Eidolon LLC

Adobe Community

Help with getting solr working