I have a few thoughts that may help you, Wladimir. It's a
long-ish reply, but I hope it has some value for you or others.
There's been some discussion of looking at logs, and that may
help, but I think you'll need to look at far more than what's been
mentioned. For instance, BKBK said to look at the "application,
exception and server logs all matter", and that may be true, as far
as the [cf]/logs/ are concerned.
But you want to be sure to check out the [cf]/runtime/logs/
as well (if in multiserver/multi-instance mode, see [jrun4]/logs/).
These other logs (including ones named [server]-out.log and
[server]-event.log) are often far more helpful in understanding the
cause of errors.
Even then, though, they're often still not enough. There may
also be hs*.logs in the [cf]/runtime/bin/ [jrun4]/bin/ that offer
additional info on jvm crashes, if that's what's happening.
Sometimes, it's not that CF crashes but that it's simply hung
up as all request queues are busy. In that case, you need to know
what's going on in the CF engine when things go bad. One thing that
helps is if you enable jrun metrics, which logs status info at a
chosen interval (such as every few seconds). The CFSTAT command
(built into CF, in the [cf]/bin directory) can help as well, as can
perfmon stats (though not on Linux).
I discussed these and other sorts of resources for
troubleshooting in a talk I gave at Max (at the CF Unconference)
called, "CF911: Tools and techniques for Troubleshooting", which
you can find online at
http://www.carehart.org/presentations/#cf911.
Hope that may be helpful.
I'll just add in conclusion that there's always an
explanation to CF hanging up. It's not "just broken", so it's a
shame when hosts (and others) just "restart CF" to make the
"problem go away". There's always a root cause, and as in your
case, it repeats, so the problem will come back.
The challenge is to find that root cause, when it "goes
rogue". The issue may be due to CF config, jvm config, jvm version
(there's a known issue with the built-in jvm in CF8, but you're on
7). It may be due to load (perhaps unexpected). You may be running
out of memory or CPU. Your rackspace techs don't clarify.
Since you're on 7, there's also a known issue of file uploads
being a potential killer in that they use up memory (equal to the
size of the file uploaded) that's never released. There's a hotfix
for that. See
http://www.adobe.com/go/kb401239
(and in my experience, it has nothing to do with CFCs, as suggested
in the title and description). If you're running out of memory in
CF, I'd highly recommend this (and it's not applied if you've
applied even the latest cumulative hotfix for CF7).
And speaking of hotfixes, I find many shops still running on
the original release of whatever they have (such as 7.0). You
should at least move up to 7.01 or 7.02. And even then they've
often not applied cumulative hotfixes (or individual ones). Many
times there are problems that are solved with these.
Going back to the discussion of memory, are you (or they)
tracking memory use within CF? whether by watching the memory used
by the jrun process (less effective) or watching memory use within
CF (more effective)? The JRun metrics can show you, or there are
available java methods you can call (for instance, see
http://www.petefreitag.com/item/115.cfm).
Finally, there are also useful commercial tools like
FusionReactor and SeeFusion which can help, and they're more than
"just monitors", in that they track information that you can review
after a crash (and especially more in FusionReactor, which does
tremendous yet lightweight logging of lots of details about running
requests, queries, and more).
I use all these tools and logs (and more) when I help people
solve these kinds of problems. Half the battle is knowing the tools
and how to connect the dots in the diagnostic info they provide. I
hope the info above may help, and of course this forum is a great
resource so ask away.
Note as well that there are various companies that can help
also, whether on-site or over-the-web. See
http://www.cf411.com/#cfconsult
for a list of several. Some require days at a minimum, while some
(like myself) have no minimum. Sorry if that sounds like a sales
pitch to some. It's really not, and I've tried to offer a lot of
info for free above and on my site (carehart.org). But sometimes
people just want to make the pain go away as fast as possible,and I
just want them to know they don't need to suffer if they'd rather
pull in some help.
/Charlie (troubleshooter, carehart. org)