Is it possible to start the CF8 server monitor programmatically?

Report · May 04, 2009

We are using the alert feature of the CF8 server monitor to automatically restart the CF8 service if it becomes unresponsive.

Since restarting CF8 will also kill the server monitor, is there any way to start the server monitor automatically right after CF8 restarts?

Thank you in advance for your ideas!

Report · May 04, 2009

@kit001, the answer would be no. The alert feature runs from within CF itself: if it goes down, the Alert process goes down with it. It's natural, of course, to assume the Server Monitor is somehow external to CF, but it really isn't. The Flex App itself is, sure, but it's just exposing info you create and enable that's managed from within CF itself (turning on alerts, turning on/off the "start" buttons, etc.)

If CF goes down, the alert mechanism can't detect that the server's gone down. That's just not its job: indeed, you'll notice there's no such alert available. It can only monitor things about the server while it stays up.

Some may think, "well, might this be the job of the Multiserver monitor?" Again, no it's not. The Multiserver monitor is just another flex app that can watch what's going on inside any of the CF servers it's told to watch. It doesn't, itself, have any power to trigger events, restarts, etc.

Bottom line: to have CF restart, something outside of it must restart it. (Note that on Windows, the Services feature has an option to cause a service to restart if it goes down.)

Finally, you had asked, "is there any way to start the server monitor automatically right after CF8 restarts". Again, no. First, as stated above, the server monitor is not itself something separate from CF (except in being a Flex interface). More to the point, the alert feature is part of the monitor built into CF.

But someone may read this as asking if there's a way to do the equivalent of the "start monitoring" button (offered in the Server Monitor) so that it's enabled "automatically right after CF8 restarts." Well, the Start buttons are toggles which, once enabled, always cause CF to in fact do the selected monitoring/profiling/memory tracking immediately upon startup. You don't need to "turn them on", if you've already done so while the server was up.

But again: they don't have anything to do with restarting CF. You really have to rely on other tools for that.

For any readers interested, I have a 4-part series on the CF 8 Server Monitor in the Adobe Dev Center. More at:

http://www.carehart.org/blog/client/index.cfm/2008/7/30/45page_server_monitor_guide

Hope that's helpful.

/charlie

PS I can also offer consulting or training assistance, for as little time as may be needed, in working with the Server Monitor or other monitoring and troubleshooting tools for CF.

/Charlie (troubleshooter, carehart. org)

Report · May 04, 2009

Hi Charlie,

It is my honor to receive the first response from you! I must thank you for the outstanding articles and posts about the CF8 Server Monitor. We were not even aware of the CF8 Server Monitor before finding your articles through Google!

There were several instances at our production server that the "Unresponsive Alert" was not able to recover CF8 by killing threads alone. The only way is to restart CF8. Here is how we set up the CF8 server monitor to restart CF8 using the “unresponsive” alert feature (i.e. cfexecute a restart batch file):

1. Create a file named cfrestart.bat at ColdFusion8\bin:

@Echo off

setlocal

cd ..\runtime\bin

jrun -stop coldfusion

jrun -start coldfusion

endlocal

2. Create a file named selfRestart.cfc at ColdFusion8\runtime\bin:

</cffunction>

</cfcomponent>

3. Under Alerts tab at CF8 Server Monitor, select Unresponsive Server tab. Check Enable and type selfRestart.cfc under Processing CFC. Click Apply.

I have tested by triggering selfRestart.cfc via the "Slow Server" alert and CF8 was restarted successfully. I assume it will work the same for "Unresponsive Server" alert.

If there is a way to programmatically start the CF8 server monitor after CF8 restarts, our server can theoretically run 24x7 without any manual restarts.

Thanks again!

Kit

Report · May 04, 2009

First, thanks for your kind regards, Kit.

Second, I see I misread your note. I thought you were asking how to restart CF from within the monitor. And while I said there's no built-in provision for that, I see that you used the feature to have Alerts trigger a CFC to call a restart script. OK.

Third, well, you still are asking "If there is a way to programmatically start the CF8 server monitor after CF8 restarts". I did explain that the server monitor does start immediately. The "start" buttons remain enabled over CF restarts (something that does indeed surprise many, especially if they enable the "start memory tracking" and it crashes their machine, and then a restart keeps it enabled so it crashes again.)

That should be the answer to your question, as I sense somehow you missed it. Let us know.

All that said, I'd like to offer a different thought on your just killing the server whenever there are hung requests. Would you like instead to find out and resolve the root cause for the unresponsive server?

Sure, some may say "we don't have time for diagnostics. We need to get the server back up and running immediately", but there are plenty of apps where there's a lot of pain in just crashing the server.

Now, one may say, "but if it's become responsive, what choice do we have?". But you do have choices, really, and there are features in the monitor that can help you find and resolve the root cause of the problem instead.

First, you note that you couldn't kill requests. That can certainly happen if a request is doing a tag like CFQUERY or CFHTTP (anything talking to something outside of CF).These can't be interrupted so the request can't be killed. So you need to know what the request that's hung is doing, to get to the root cause of the problem.

Bringing down the server only masks the root cause of the problem. It's like firing a cashier each time there's money missing from the register, when a recording might catch that it's the manager skimming the till.

So you need to be able to "zoom the camera" in on the request to find out what it's doing. Fortunately, the Server monitor helps you in either of two ways: automatically or manually.

First, since the unresponsive server alert detects when requests that are not responding, that's when you have a problem. And if you choose to enable that a "snapshot" be triggered by the alert, it shows you details about all requests that are running. And in the "thread dump" at the bottom of the snapshot you can find the CF (JRPP) threads that are associated with CF requests.

Second, you can see a stack trace for an individual request if you enable start profiling, and then double-click on the request while it's running.

Either way, the stack trace will show what line of CFML code is being executed at the time the stack trace is requested. And if you view a couple of them within a short period of time, that can tell you if the request is indeed hung (waiting on some one tag to run) or may really just be working really hard and it needs time to finish.

Hope that helps.

/charlie

/Charlie (troubleshooter, carehart. org)

Report · May 05, 2009

Hi Charlie,

Thanks again for the prompt and insightful advices!
Our production server was unresponsive again this morning and CF8 was able to restart automatically as expected!
You are absolutely right, the server monitor remains up and running after the CF8 restart.
The snapshot captured by the "Unresponsive Server" alert doesn’t give us too much to troubleshoot. Here is part of the Java stack trace:

"scheduler-21" prio=5 tid=57 TIMED_WAITING
     at java.lang.Object.wait(Native Method)
     at jrunx.scheduler.SchedulerService.createRunnable(SchedulerService.java:188)
     at jrunx.scheduler.ThreadPool$DownstreamMetrics.createRunnable(ThreadPool.java:287)
     at jrunx.scheduler.ThreadPool$ThreadThrottle.createRunnable(ThreadPool.java:349)
     at jrunx.scheduler.ThreadPool$UpstreamMetrics.createRunnable(ThreadPool.java:241)
     at jrunx.scheduler.WorkerThread.run(WorkerThread.java:62)

"obj-skimmer" prio=5 tid=71 TIMED_WAITING
     at java.lang.Object.wait(Native Method)
     at coldfusion.server.j2ee.pool.PoolSkimmerThread.run(PoolSkimmerThread.java:47)
     at java.lang.Thread.run(Thread.java:619)

"cfthread-0" prio=5 tid=106 TIMED_WAITING
     at java.lang.Object.wait(Native Method)
     at coldfusion.util.GenericThreadPool$ThreadPoolRunnableFactory.createRunnable(GenericThreadPool.java:177)
     at coldfusion.scheduling.ThreadPool.createRunnable(ThreadPool.java:128)
     at coldfusion.scheduling.WorkerThread.run(WorkerThread.java:68)

"worker #3" prio=5 tid=96 WAITING
     at java.lang.Object.wait(Native Method)
     at java.lang.Object.wait(Object.java:485)
     at com.jnbridge.jnbcore.server.b.c.run(Unknown Source)
     at java.lang.Thread.run(Thread.java:619)

We are suspecting CF8 might have been brought down by other CPU-intensive processes such as Bpbkar32.exe (Netbackup) or Dgent.exe (Patchlink).
We will discuss and check the daily schedule of those processes with the Windows server team. Now that I know the CF8 server monitor is up 24x7, I can sleep much better at night ...

Thanks again!

Kit

Report · Sep 05, 2009

Hey Kit, going back to this thread from May, have things continued to work using the approach you had described above? All still well?

I will add that, as for the stack trace you showed and your wondering what it told you, I'm sorry I never responded. Unfortunately, I had set the forum to send me all emails from this CF Server Admin forum, so I couldn't really have it notify me specifically of any one that I'd participated in. I've changed it to stop that, so I will from now on only get emails from threads in which I participate. If I somehow still miss any reply you make here, feel free to drop me a note at charlie at carehart.org.

As for that stack trace, well, it really didn't have enough info. You would need to look at all the running threads and more important, compare them over multiple snapshots, to be able to identify if any indicate a given thread showing a request stuck doing the same thing across the snapshots. That would be a smoking gun to then investigate, and it would indeed identify the specific line of CFML code that the request was stuck on.

I've approached the Adobe DevNet folks about doing a revision (or continuation) of the 4-part series from 2007. The Monitor isn't changing much in CF 9, but there are things I and others have learned that could be communicated, and solutions like yours could be shared. And while there are some old Adobe technotes on how to create and interpret stack traces and thread dumps, they're really quite old and refer to old manual approaches, with no reference to the server monitor, the snapshots, etc. We shall see if that updated article/series comes to pass.

In the meantime, you may want to check out such technotes as http://kb2.adobe.com/cps/183/tn_18339.html.

Hope that's helpful. Or ask away or update us on how things went for you.

/Charlie (troubleshooter, carehart. org)

Is it possible to start the CF8 server monitor programmatically?

li.media.uploader-dialog.title