Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Server crashes

New Here ,
Nov 27, 2008 Nov 27, 2008
Hi,

We are running CF 7 on a Linux server. the issue is our server keeps on crashing and the techs at rackspace cant help as they have no CF techs available.

Here is the info they provided me:

"Thank you for your patients. As per my response I simply restarted cold fusion to get the site responsive again. It seems that jrun was consuming most of the resources not allowing the site to resolve and after restarting cold fusion your site began to respond. If you have any further questions or concerns, please feel free to update this ticket or contact us here directly. Thank you again."

Can anyone shed any light on this? It seems to happen every few weeks and has done so for months.

Thanks again,

Wladimir
2.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 27, 2008 Nov 27, 2008
Ask for and study the log files.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 28, 2008 Nov 28, 2008
Hi BKBK,

Thank you very much for getting back to me. Ive asked Rackspace for my logs but the only issue is there is 26MB worth of them! :)

Ive listed a screen shot below below which shows how many files there are:

http://www.dpivision.com/screenshot.jpg

Any help would be greatly appreciated.

Thanks again,

Wladimir
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 29, 2008 Nov 29, 2008
26MB isn't a big deal. They should be able to copy it for you.

Application, exception and server logs all matter. However, you can narrow the search down to just a few lines of text. The key is to look for clues on the date and time of the server crash, and in the minutes preceding the crash. Bring the errors to the forum.



Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 01, 2008 Dec 01, 2008
Thank you BKBK and ive asked Rackspace just that:

-----------------

Rackspace said:
2008-12-01 11:16:05 (UTC+0)

Hi Wladimir,

I understand now, thanks. Its going to be really tricky for us to help here - we just don't have the understanding of Coldfusion to be able to identify problems from background noise in the logfiles the way we can with supported systems like Apache and PHP.

You are going to have to identify the time of one of the crashes and then go through the logfiles to pull out any alerts that occurred in the minutes preceeding the crash, then try to identify anything that shouldn't be happening there. Its a highly time consuming and manual task and really best performed by someone familiar with the application and coldfusion.

The kind of commands that I would use to extract the information would be this kind of query..

grep "11/22/08" application.log | grep Error | grep -v "File not found"

Which pulls application errors such as this:

"Error","jrpp-50036","11/22/08","23:55:02","vhsdirect","Invalid list index 3.In function ListGetAt(list, index [, delimiters]), the value of index, 3, is not a valid as the first argument (this list has 2 elements). Valid indexes are in the range 1 through the number of elements in the list. The specific sequence of files included or processed is: /var/www/vhosts/vhsdirect.co.uk/httpdocs/site/product.cfm, line: 169 "

Then cross reference this with any errors from server logfile like this..

grep "11/22" cfserver.log | grep "23:55"
11/22 23:55:18 Error [jrpp-84358] - Could not find the included template ../layouts/.Note: If you wish to use an absolute template path (e.g. TEMPLATE=""/mypath/index.cfm"") with CFINCLUDE then you must create a mapping for the path using the ColdFusion Administrator. Using relative paths (e.g. TEMPLATE=""index.cfm"" or TEMPLATE=""../index.cfm"") does not require the creation of any special mappings. It is therefore recommended that you use relative paths with CFINCLUDE whenever possible. The specific sequence of files included or processed is: /var/www/vhosts/ktduk.com/httpdocs/site/product.cfm, line: 92
11/22 23:55:02 Error [jrpp-50036] - Invalid list index 3.In function ListGetAt(list, index [, delimiters]), the value of index, 3, is not a valid as the first argument (this list has 2 elements). Valid indexes are in the range 1 through the number of elements in the list. The specific sequence of files included or processed is: /var/www/vhosts/vhsdirect.co.uk/httpdocs/site/product.cfm, line: 169

The process is something that you or your developers would need to complete though.

Kind Regards,

Andrew

--------------------

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Nov 29, 2008 Nov 29, 2008
same problem here --- issue is JRun running out of virtual memory space immediately on loading... if I get it resolved I'll post again.
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 01, 2008 Dec 01, 2008
That's a start. I would verify these for a start.

The application calls a list with index 3 on line 169 in /httpdocs/site/product.cfm, whereas the list has just 2 elements. Coldfusion couldn't find a template included on page 92 in /httpdocs/site/product.cfm.


Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 02, 2008 Dec 02, 2008
Hi Guys,

Firstly a big thank you to everyone for their help, its all really appreciated.

@ BKBK

"That's a start. I would verify these for a start.

The application calls a list with index 3 on line 169 in /httpdocs/site/product.cfm, whereas the list has just 2 elements. Coldfusion couldn't find a template included on page 92 in /httpdocs/site/product.cfm. "

Would this cause the server to crash over and over again though? Surely a 404 cant make that much damage?

Thanks again,

Wladimir
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 01, 2008 Dec 01, 2008
I have a few thoughts that may help you, Wladimir. It's a long-ish reply, but I hope it has some value for you or others.

There's been some discussion of looking at logs, and that may help, but I think you'll need to look at far more than what's been mentioned. For instance, BKBK said to look at the "application, exception and server logs all matter", and that may be true, as far as the [cf]/logs/ are concerned.

But you want to be sure to check out the [cf]/runtime/logs/ as well (if in multiserver/multi-instance mode, see [jrun4]/logs/). These other logs (including ones named [server]-out.log and [server]-event.log) are often far more helpful in understanding the cause of errors.

Even then, though, they're often still not enough. There may also be hs*.logs in the [cf]/runtime/bin/ [jrun4]/bin/ that offer additional info on jvm crashes, if that's what's happening.

Sometimes, it's not that CF crashes but that it's simply hung up as all request queues are busy. In that case, you need to know what's going on in the CF engine when things go bad. One thing that helps is if you enable jrun metrics, which logs status info at a chosen interval (such as every few seconds). The CFSTAT command (built into CF, in the [cf]/bin directory) can help as well, as can perfmon stats (though not on Linux).

I discussed these and other sorts of resources for troubleshooting in a talk I gave at Max (at the CF Unconference) called, "CF911: Tools and techniques for Troubleshooting", which you can find online at http://www.carehart.org/presentations/#cf911. Hope that may be helpful.

I'll just add in conclusion that there's always an explanation to CF hanging up. It's not "just broken", so it's a shame when hosts (and others) just "restart CF" to make the "problem go away". There's always a root cause, and as in your case, it repeats, so the problem will come back.

The challenge is to find that root cause, when it "goes rogue". The issue may be due to CF config, jvm config, jvm version (there's a known issue with the built-in jvm in CF8, but you're on 7). It may be due to load (perhaps unexpected). You may be running out of memory or CPU. Your rackspace techs don't clarify.

Since you're on 7, there's also a known issue of file uploads being a potential killer in that they use up memory (equal to the size of the file uploaded) that's never released. There's a hotfix for that. See http://www.adobe.com/go/kb401239 (and in my experience, it has nothing to do with CFCs, as suggested in the title and description). If you're running out of memory in CF, I'd highly recommend this (and it's not applied if you've applied even the latest cumulative hotfix for CF7).

And speaking of hotfixes, I find many shops still running on the original release of whatever they have (such as 7.0). You should at least move up to 7.01 or 7.02. And even then they've often not applied cumulative hotfixes (or individual ones). Many times there are problems that are solved with these.

Going back to the discussion of memory, are you (or they) tracking memory use within CF? whether by watching the memory used by the jrun process (less effective) or watching memory use within CF (more effective)? The JRun metrics can show you, or there are available java methods you can call (for instance, see http://www.petefreitag.com/item/115.cfm).

Finally, there are also useful commercial tools like FusionReactor and SeeFusion which can help, and they're more than "just monitors", in that they track information that you can review after a crash (and especially more in FusionReactor, which does tremendous yet lightweight logging of lots of details about running requests, queries, and more).

I use all these tools and logs (and more) when I help people solve these kinds of problems. Half the battle is knowing the tools and how to connect the dots in the diagnostic info they provide. I hope the info above may help, and of course this forum is a great resource so ask away.

Note as well that there are various companies that can help also, whether on-site or over-the-web. See http://www.cf411.com/#cfconsult for a list of several. Some require days at a minimum, while some (like myself) have no minimum. Sorry if that sounds like a sales pitch to some. It's really not, and I've tried to offer a lot of info for free above and on my site (carehart.org). But sometimes people just want to make the pain go away as fast as possible,and I just want them to know they don't need to suffer if they'd rather pull in some help.

/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 02, 2008 Dec 02, 2008
I meant to comment on that, Wladimir. I can't see how that would cause a crash, no. He was focused on your cf/logs. I recommended you look at your cf/runtime/logs. Any news from that? Or any of the other info I offered? Have you considered enabling Jrun metrics? Have you looked at CFSTAT? The answers are often there among the various diagnostics, provided or that can be enabled.

/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 03, 2008 Dec 03, 2008
Hi carehart,

Our server just died again and his is the info from Rackspace:

"Per your request I have pulled the logs from your server preceding the issue and can see that there is a file missing from your directory which is triggering this alert.


Usage: file [-bciknsvzL] [-f namefile] [-m magicfiles] file...
Usage: file -C [-m magic]
Try `file --help' for more information.
Usage: file [-bciknsvzL] [-f namefile] [-m magicfiles] file...
Usage: file -C [-m magic]
Try `file --help' for more information.
[Wed Dec 03 12:43:01 2008] [error] [client 193.108.87.5] File does not exist: /var/www/vhosts/default/htdocs/department
[Wed Dec 03 12:44:57 2008] [error] [client 84.9.112.110] File does not exist: /var/www/vhosts/default/htdocs/favicon.ico
[Wed Dec 03 12:45:00 2008] [error] [client 84.9.112.110] File does not exist: /var/www/vhosts/default/htdocs/favicon.ico"

-------

You also mention I should take a look at the "cf/runtime/logs" which one is that?

http://www.dpivision.com/screenshot.jpg

Thanks again,

Wladimir



Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 04, 2008 Dec 04, 2008
Wladimir, your screenshot shows that you're still looking in the cf/logs directory. Please reread my first note above. I said you need to look instead in the runtime logs. You ask where those are. Again, I referred to them in my first note above:

"check out the [cf]/runtime/logs/ as well (if in multiserver/multi-instance mode, see [jrun4]/logs/). These other logs (including ones named [server]-out.log and [server]-event.log) are often far more helpful in understanding the cause of errors."

I used [cf] since the location varies by version and OS. So if on CF7 (on a Server install), they're in cfusionmx7/runtime/logs. On a multiserver/multiinstance deployment, it's jrun4/logs (or wherever those equivalents are stored on your Linux server).

You say the host says, "there is a file missing from your directory which is triggering this alert". What alert is he talking about? You said the server is crashing. I honestly have never heard of a server crashing because of a file missing. Messages like that are very common in the cf/logs, but the runtime/logs may tell a far different story. Even then, though, the answer may still not be obvious from those. As I said in my first note, though, the information to solve the problem is there, or can be added to make it be there.

/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 03, 2008 Dec 03, 2008
Carehart wrote:
I can't see how that would cause a crash, no. He was focused on your cf/logs.

The hierarchy is page-request => application => server. An error in a request can escalate into a server crash. For example, an endless loop or a growing factory of objects will eventually bring down the server.

The moral is clear. You cannot begin to debug a server crash when there are errors or bugs in your application.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 04, 2008 Dec 04, 2008
BKBK, if you've seen a server crash because of this, then thanks for sharing your perspective. I don't disagree with your depiction of the processing hierarchy. I'm just saying that in the hundreds of instances per year where I've helped people in positions like Wladimir's, a missing file has not caused the sort of escalation you propose. Rather, it's always been something else that's crashed the server. And it's not taken ruling out every CFML error in the application.log to find and resolve the problem. Still, there's nothing wrong with trying that as an alternative when lacking other information or techniques.

Further, and I should have said this to Wladimir, in my experience, it almost never that the server is really "crashing". Instead, it's usually that something is tying up all the request threads, so no more requests get in. That appears to the users (and admins) as a "hung" server. With tools like FusionReactor and SeeFusion (for CF 6, 7, and 8), or the CF8 Monitor, you can actually see what requests are running in a given moment. Usually it's something causing the requests to hang.

It may be that they're making a call to the DB and it's locked, or they're making a call to a web service, or a CFHTTP call, or something else like that, which is hanging up, and therefore preventing any new requests. Think of it like a cashier line: if the credit card processing system goes down, everyone's going to start piling up.

And sadly, in situations like this, you may note have any messages in the logs saying anything's "wrong". Even without those tools above, at least if you use the Jrun metrics, CFSTAT, or Perfmon (as I mentioned in the first note), you can see *if and how many requests* are running or queued (they start queuing when they can't run, just like people lining up behind the cashiers, though it's more like a bank teller line in that all are in one line and would go to the next available window. Indeed, sometimes it's really that all but one window is locked up, so some few people are indeed getting through, but again it looks and feels like the system's hung.

Even if this is not W's case, perhaps this description may help others.

/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 05, 2008 Dec 05, 2008
Hi Guys,

Once again a big thank you to everyone. Im busy speaking to rackspace and I will get to the bottom of this! :)

They've just updated me again:

"I now attached the Cold Fusion event log file to this ticket to review.

I also included a SAR report of the server’s resource details that include CPU utilization, memory and swap space utilization, queue length and load averages.
More details as per the sar linux command line tool can be found in url (linux.die.net/man/1/sar).

I also included the last 600 entries from the Cold Fusion log file (/opt/coldfusionmx7/logs/cfserver.log) that logged a number of fatal errors (IE Fatal: Stack size too small. Use 'java -Xss' to increase default stack size)

If you have any more questions, please update the following ticket or contact the Rackspace Managed Hosting helpdesk."

Still waiting for some more info and will post again.

Thanks again,

Wladimir
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 09, 2008 Dec 09, 2008
wwbr wrote:
> I also included the last 600 entries from the Cold Fusion log file
> (/opt/coldfusionmx7/logs/cfserver.log) that logged a number of fatal errors (IE
> Fatal: Stack size too small. Use 'java -Xss' to increase default stack size)

That stack size too small error is a smoking gun. You most likely have a
runaway recursive process somewhere. You can do what the error suggests
(java -Xss to increase the stack), or you can go hunting for the
recursive process itself. Are you up to date on all your ColdFusion and
JVM updates?

Jochem

--
Jochem van Dieten
Adobe Community Expert for ColdFusion
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 09, 2008 Dec 09, 2008
Thanks Jochem and yes we have increased the stack size but to no avail.

Upgrades, what are they! (hehe) and no ive not applied any so I guess this is a good starting point!

Thanks again,

Wladimir
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 09, 2008 Dec 09, 2008
LATEST
PS:

Are these logs any good?

90977_cfserverlog.txt
90977_coldfusion-eventlog.txt
90977_SAR_report.txt
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 03, 2008 Dec 03, 2008
You also mention I should take a look at the "cf/runtime/logs" which one is that?

If you cannot find anything in the CF logs, then looking at the runtime logs is indeed the next step. Application, exception and server runtime logs are all relevant. The key is to extract the entries at and just before the server crash.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2008 Dec 05, 2008
@Carehart
I don't argue the point that a crash happens, by definition, at runtime. Neither did I say or suggest that a missing file is the cause of this crash.

My point is that you should first eliminate program errors before looking for the cause of a crash. I advised Wladimir to study the logs and to rule things out. In any case, one can easily create a hypothetical use-case that involves a missing file and a server crash.

Suppose page1.cfm errors because a file is missing. If you don't attend to that, it might lead to a crash if page2.cfm contains code that enters an inifinite loop as a result of the missing file. That's all academic, I agree. Just illustration.

@Wwbr,
While you wait, here are some stabs in the dark that hit the mark in the past. Prime suspect: your Application file. Examine it for loops, includes, cflocation, scheduled jobs, gateway calls, read/write processes and object-creation processes.





Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 07, 2008 Dec 07, 2008
thanks for usefull info.
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 09, 2008 Dec 09, 2008
Hi Guys,

Are these logs any good?

90977_cfserverlog.txt
90977_coldfusion-eventlog.txt
90977_SAR_report.txt

Or do I need to ask Racspace for something else?

Thanks again,

Wladimir
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources