Skip to main content
Inspiring
November 25, 2013
Question

Request count drops to nothing

  • November 25, 2013
  • 2 replies
  • 3435 views

I've been having this problem recently with our server and I've never seen anything like it before.  I'm hoping someone in the community might be able to give me some guidance...

We run a ColdFusion 10 Standard server (recently patched with update 12), which handles about 30 requests per second on average.

Lately, we've been experiencing some very odd performance issues, so I've been leaving FusionReactor open on my screen pretty much all day long.  And what I've noticed is really bizarre.

Periodically, the metrics graphs will show a drop-off in requests from the usual 20-40 to 0.  This last for about 5 seconds, and when the requests come back, they shoot right up to the max (our server set its max at 100).  It's almost as if something is blocking the requests from getting through for those few seconds, and then we get flooded all at once.  Either that, or FusionReactor isn't properly recording the traffic.  At one point this morning this was happening every minute or so, such that our server was just choking on the requests and we were forced to just reboot the whole machine.

Has anyone experienced this before?  Or might be able to point me in a good direction?

    This topic has been closed for replies.

    2 replies

    Legend
    November 26, 2013

    Hi Smurf,

    The webserver connector log in ColdFusion10\config\wsconfig\N\ (isapi_redirect.log case IIS mod_jk case Apache ) might have some interesting detail. Any errors or warnings for the time interval then the requests appear to disappear and come flooding back?

    What is coldfusion.exe doing CPU RAM wise in Task Manager when that occurs?

    HTH, Carl.

    Charlie Arehart
    Community Expert
    Community Expert
    November 26, 2013

    Smurf, first, as to your wondering whether you can trust what FR is reporting, note that you can “check on it” using 2 new features in CF10: you can look at either its metrics.log (enabled in the CF admin Debug Output Settings page), which tracks every 5 seconds things like how many requests are running, and you can look in its request log (in \[instance]\runtime\logs), which logs every request that runs in CF.

    Second, let me get ahead of someone who may be tempted to say “it sounds like you’re experiencing GC pauses”. If that were the issue, it would not explain no requests running (at a point in time). Instead, the GC pause would just make any running requests take a little longer to complete.

    Third, as for your conclusion, I would agree that it seems that something “something is blocking the requests from getting through for those few seconds”, and my first suspicion would be the web server connector. Are you fronting this CF10 with IIS or Apache? In either case, had you updated the web server connector since applying various CF10 updates? The technote for the update tells you to, but many miss it. If you don’t update the connector, then you could well have seeming performance/reliability problems “in CF” that are really “in the connector”. I discuss the issue and the process of updating the connector in this blog entry:

    http://www.carehart.org/blog/client/index.cfm/2013/9/13/why_you_must_update_cf10_webserver_connector

    Finally, I’ll note that since you had some doubts about whether FR might be at issue here, I’ll propose that it would be in your interest to raise such questions in the FR mailing list groups.google.com/group/fusionreactor/, where Intergral engineers as well as active users like myself enjoy helping solve problems, even in this case where it may not really be FR itself that’s the issue.

    But one never knows, so I’m not saying these above ARE the answer, just that they may be. Let us know what you find.

    /charlie

    /Charlie (troubleshooter, carehart. org)
    Inspiring
    November 26, 2013

    Charlie and Carl, thanks for the helpful responses.  To answer...

    1. Yes, we recently updated to CF10 update 12.  At that time, yes, I did update the web connectors.

    2. I am starting to wonder if perhaps there is a connection with running the update (and reseting the connectors) and our issue.  The timing is very suspicious, since it pretty much started then.  Is it possible that the updater (or the new connector file) actually introduced this issue?

    3. I agree that it's not a GC pause issue.  The timing just doesn't match up.  And we're not having memory problems.  Just too many requests at once all queueing up.

    4. I took a look at the metrics log in CFAdmin, as suggested, but all of the data is coming through as NULL.  It's really odd.  It basically looks like:

    Max threads: null Current thread count: null Current thread busy: null Max processing time: null Request count: null Error count: null Bytes received: null Bytes sent: null Free memory: 4579026992 Total memory: 6100090880 Active Sessions: 7

    Any ideas on what might be causing this?

    5. As far as what coldfusion.exe is doing... is that what FusionReactor tracks in that CPU graph in the metrics section?  The one on the bottom right?  If that's the case, then that pretty much never gets above 20% and most of the time its under 10%.  Even when I'm having this issue.

    6. I took a look at the web connector logs like Carl mentioned, and I found a ton of errors that look like:

    [Mon Nov 25 09:09:47.274 2013] [4748:4064] [error] start_response::jk_isapi_plugin.c (1158): HSE_REQ_SEND_RESPONSE_HEADER failed with error=87 (0x00000057)

    [Mon Nov 25 09:09:47.275 2013] [4748:4064] [error] isapi_write_client::jk_isapi_plugin.c (1283): WriteClient failed with 1229 (0x000004cd)

    [Mon Nov 25 09:09:47.275 2013] [4748:4064] [info] ajp_process_callback::jk_ajp_common.c (1992): Writing to client aborted or client network problems

    [Mon Nov 25 09:09:47.275 2013] [4748:4064] [info] ajp_service::jk_ajp_common.c (2692): (cfusion) sending request to tomcat failed (unrecoverable), because of client write error (attempt=1)

    [Mon Nov 25 09:09:47.276 2013] [4748:4064] [info] HttpExtensionProc::jk_isapi_plugin.c (2305): service() failed because client aborted connection

    I see this over and over again.  Is this the smoking gun?  Does it have anything to do with my issues?  What exactly does this even mean?

    Legend
    November 26, 2013

    RE Connector logs. Not seen those errors before. Agree likely a clue as to why there is a response problem.

    Would be good to know when the errors started occurring. Trouble is after update 12 and connector change the connector log would have been re-created. Any chance you can look at an older log from backup media? It would be good to know if the error messages are new or have been present for a long time since before update 12 or connector change.

    Do either ColdFusion10\cfusion\logs\ coldfusion-error.log or coldfusion-out.log also have errors or warnings at the same time as connector log?

    HTH, Carl.