• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Solr is VERY slow

New Here ,
Feb 02, 2015 Feb 02, 2015

Copy link to clipboard

Copied

Configuration

CF 9.0.0.251028

Oracle 11G

Windows Server 2008 R2 SP1 hosted remotely.  Virtual Server with 100 users.  Normal only 5 or 6 on at a time

Coldfusion serving from c:\inetpub\wwwroot\application_name

Documents stored S:\docs (same server virtual drive)

34000 docs in 3300 folders total size including not indexed docs is about 45 gigs (PDF, HTM, Txt, all variances of MS Office Docs, RTF)

Collection indexing taking days instead of hours and it does not seem to matter if it is verity or solr.  Resourse monitor shows solr create the cache and it flat out blazes through doing that, but the only indication I have of it ACTUALLY doing anything after is 50 to 70% cpu usage.

I increased the buffer to 80 but I am at a loss on speeding this process up.

Any help will be greatly appreciated

Thanks,

Wil Hale

Views

2.7K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Advocate , Feb 20, 2015 Feb 20, 2015

Only other thing I could think of is adding a sleep after each update to slow down the searchers.

There are a few tips here you can try if you haven't already - Tips for software engineer: Solr in Coldfusion 9

Votes

Translate

Translate
LEGEND ,
Feb 02, 2015 Feb 02, 2015

Copy link to clipboard

Copied

It's got to be the sheer volume of files that you're trying to index.  Solr is (normally) much faster than Verity.

Are you indexing via the CFAdmin panel, or by CFINDEX tag?

V/r,

^_^

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 03, 2015 Feb 03, 2015

Copy link to clipboard

Copied

using Scheduler to fire off a CFM Page.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 06, 2015 Feb 06, 2015

Copy link to clipboard

Copied

i am at 56 hours and my wits end.  The verity collection only seems to take about 30 hours tops.  Is there any way to speed this process up?

On this latest run I upped the Min and Max Memory to 4 gigs (from 256).

It is just an index > Refresh of one set of docs then an update from another folder.  Heck, I cant even tell where in the process it is and the solr console is about useless.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 13, 2015 Feb 13, 2015

Copy link to clipboard

Copied

UPDATE:

solr is hanging on certain MS Excel docs.  Not all.  One of the docs is 14 mb.  Another is 126 mb.  Smaller ones seem to make it.   Nothing unusual about the xcel files.  some do have drop down sorting elements but that is not all of them.

Solr blazes if i remove xls and xlsx from the file types.

So now if I am doing an index on a folder, is there a way to to tell it to move on if it runs "too" long?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 13, 2015 Feb 13, 2015

Copy link to clipboard

Copied

Double check those xml/xmls files.  Depending upon how they were created, there might be extraneous data that is causing the collection to choke when trying to index them.

I know (for a fact) that if the Excel files were created by a ColdFusion template AND if debugging is turned on (and the IP address of the client system is within the authorized list of addresses allowed to see debugging information), then the debugging information is appended in a very loose way to the data for the Excel sheet, and can cause a lot of problems.

This happened to me on another project, and it took me almost four days to troubleshoot the issue.  Excel sheets were being created by the "SpreadsheetNew()" function via a .cfm file that (in the development environment only) had debugging information appended to every page.  I had to finally view the source of the Excel sheet, saw the CF debugging information at the bottom of the source, and turned off debugging for that page.  Once I did that, there were no more issues with the Excel sheets created by that .cfm page.

So, check the source of the Excel file (I forget how, but there IS a way) to make sure that there isn't a lot of "corrupted" data causing the collection to choke when indexing those files.

HTH,

^_^

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 20, 2015 Feb 20, 2015

Copy link to clipboard

Copied

Good point.  i know that these Excel files are all office 97 and above created.  I do have some corrupt files (mainly PDF).  I can get them to index on a short haul and just return a blank PDF shen the link is clicked.

in testing I can manage to get it to index about 4000 files in 80 directories before the latest error. "Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later"

I have made adjustments to the solr config to hold off on a commit until the end, but I do not think that is working

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Feb 20, 2015 Feb 20, 2015

Copy link to clipboard

Copied

Have you tried recreating the actual collection? Could be a corrupt collection doing this.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 20, 2015 Feb 20, 2015

Copy link to clipboard

Copied

thanks for the reply, Yes I have.  Even went as far as to remove it from the XML and remove the directories.34000 physical documents.  I let it run for 6 days.  it finally returned 8000 docs  Nothing in the logs as to why it did not index so many

Took a different approach. Now I am indexing one folder at a time and that is working for a while but I am running into the error "Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later"  I am attempting to tell it not to autocommit by remming that out and changed all autowarmings to 0. 

Not sure what else I can do

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Feb 20, 2015 Feb 20, 2015

Copy link to clipboard

Copied

LATEST

Only other thing I could think of is adding a sleep after each update to slow down the searchers.

There are a few tips here you can try if you haven't already - Tips for software engineer: Solr in Coldfusion 9

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation