Solr is VERY slow

Report · Feb 02, 2015

Configuration

CF 9.0.0.251028

Oracle 11G

Windows Server 2008 R2 SP1 hosted remotely. Virtual Server with 100 users. Normal only 5 or 6 on at a time

Coldfusion serving from c:\inetpub\wwwroot\application_name

Documents stored S:\docs (same server virtual drive)

34000 docs in 3300 folders total size including not indexed docs is about 45 gigs (PDF, HTM, Txt, all variances of MS Office Docs, RTF)

Collection indexing taking days instead of hours and it does not seem to matter if it is verity or solr. Resourse monitor shows solr create the cache and it flat out blazes through doing that, but the only indication I have of it ACTUALLY doing anything after is 50 to 70% cpu usage.

I increased the buffer to 80 but I am at a loss on speeding this process up.

Any help will be greatly appreciated

Thanks,

Wil Hale

Report · Feb 02, 2015

It's got to be the sheer volume of files that you're trying to index. Solr is (normally) much faster than Verity.

Are you indexing via the CFAdmin panel, or by CFINDEX tag?

V/r,

^_^

Report · Feb 03, 2015

using Scheduler to fire off a CFM Page.

Report · Feb 06, 2015

i am at 56 hours and my wits end. The verity collection only seems to take about 30 hours tops. Is there any way to speed this process up?

On this latest run I upped the Min and Max Memory to 4 gigs (from 256).

It is just an index > Refresh of one set of docs then an update from another folder. Heck, I cant even tell where in the process it is and the solr console is about useless.

Report · Feb 13, 2015

UPDATE:

solr is hanging on certain MS Excel docs. Not all. One of the docs is 14 mb. Another is 126 mb. Smaller ones seem to make it. Nothing unusual about the xcel files. some do have drop down sorting elements but that is not all of them.

Solr blazes if i remove xls and xlsx from the file types.

So now if I am doing an index on a folder, is there a way to to tell it to move on if it runs "too" long?

Report · Feb 13, 2015

Double check those xml/xmls files. Depending upon how they were created, there might be extraneous data that is causing the collection to choke when trying to index them.

I know (for a fact) that if the Excel files were created by a ColdFusion template AND if debugging is turned on (and the IP address of the client system is within the authorized list of addresses allowed to see debugging information), then the debugging information is appended in a very loose way to the data for the Excel sheet, and can cause a lot of problems.

This happened to me on another project, and it took me almost four days to troubleshoot the issue. Excel sheets were being created by the "SpreadsheetNew()" function via a .cfm file that (in the development environment only) had debugging information appended to every page. I had to finally view the source of the Excel sheet, saw the CF debugging information at the bottom of the source, and turned off debugging for that page. Once I did that, there were no more issues with the Excel sheets created by that .cfm page.

So, check the source of the Excel file (I forget how, but there IS a way) to make sure that there isn't a lot of "corrupted" data causing the collection to choke when indexing those files.

HTH,

^_^

Report · Feb 20, 2015

Good point. i know that these Excel files are all office 97 and above created. I do have some corrupt files (mainly PDF). I can get them to index on a short haul and just return a blank PDF shen the link is clicked.

in testing I can manage to get it to index about 4000 files in 80 directories before the latest error. "Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later"

I have made adjustments to the solr config to hold off on a commit until the end, but I do not think that is working

Report · Feb 20, 2015

Have you tried recreating the actual collection? Could be a corrupt collection doing this.

Report · Feb 20, 2015

thanks for the reply, Yes I have. Even went as far as to remove it from the XML and remove the directories.34000 physical documents. I let it run for 6 days. it finally returned 8000 docs Nothing in the logs as to why it did not index so many

Took a different approach. Now I am indexing one folder at a time and that is working for a while but I am running into the error "Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later" I am attempting to tell it not to autocommit by remming that out and changed all autowarmings to 0.

Not sure what else I can do

Report · Feb 20, 2015

Only other thing I could think of is adding a sleep after each update to slow down the searchers.

There are a few tips here you can try if you haven't already - Tips for software engineer: Solr in Coldfusion 9

Adobe Community

Solr is VERY slow

1 Correct answer