Skip to main content
Participant
December 22, 2011
Question

Solr error when indexing recordset of 500k records

  • December 22, 2011
  • 1 reply
  • 1063 views

Hello,

I am using CF9.0.1 and am having terrible trouble indexing large recordsets. In this instance, I am trying to index a collection of nearly 500,000 records, but around the 400,000 mark Solr is returning an error:

org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated.

Dec 21, 2011 12:56:16 AM org.apache.solr.core.SolrCore execute

INFO: [candsearch_e14] webapp=/solr path=/update params={waitSearcher=false&commit=true&wt=javabin&waitFlush=false&version=1} status=500 QTime=65187

Dec 21, 2011 12:56:16 AM org.apache.solr.common.SolrException log

SEVERE: java.lang.RuntimeException: [was class org.mortbay.jetty.EofException] null

This happens every time when I try and index this particular collection. It never happens on smaller collection sizes of just a few hundred or a few thousand.

I have altered the JVM arguments in Solr.lax to try and improve the performance to this:

lax.nl.java.option.additional=-server -Xms1024m -Xmx1024m -XX:MaxNewSize=256m -XX:MaxPermSize=256m -XX:+ScavengeBeforeFullGC -XX:-UseParallelGC  -DSTOP.PORT=8079 -DSTOP.KEY=cfstop -Dsolr.solr.home=multicore

I have also changed the mergeFactor in the solrconfig.xml to 25 so I can speed up the indexing process (however, I have changed the the values of mergefactor and JVM and it makes no difference to the error above).

Has anyone experienced this error before? Does anyone even have any ideas what it means? I am totally out of ideas so need help.

This topic has been closed for replies.

1 reply

Sean Coyne
Participating Frequently
December 22, 2011

I'm not sure why you are seeing that error.  My best guess would be that its just too much for the CF to Solr connection to handle all at once.

That said, if you can get smaller batches to run successfully, just batch the 500k records in to two batches of 250k or 5 of 100k, etc, etc.  That should get you moving forward at least.