Skip to main content
Sam_BCH
Inspiring
February 17, 2016
Answered

CF10 Apache crashing with 503

  • February 17, 2016
  • 1 reply
  • 2615 views

Hello Experts,


Server Product : ColdFusion

Version              : 10,0,18,296093(PreRelease)

OS                       : Unix

OS Version        : 2.6.32-573.12.1.el6.x86_64

Adobe Driver    : 4.1 (Build 0001)


So, we are having our CF server crash every now and then with 503 error. There is no particular pattern as to why this happens.


There are two servers behind a load balancer..... We have a health check methodology that checks for a cfm page every 10 seconds. If it receives multiple 503's, users are switched over to another server.

After a CF restart, things come backup normally. Following is the log file at the crash point for one instance.....


Feb 17, 2016 14:17:52 PM Information [ajp-bio-8012-exec-12] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:17:52 PM Information [ajp-bio-8012-exec-6] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx.

Feb 17, 2016 14:17:53 PM Information [ajp-bio-8012-exec-13] - Starting HTTP request {URL=xxxxxxxxxxxxxxxxxxxxxxx, method='get'}

Feb 17, 2016 14:17:53 PM Information [ajp-bio-8012-exec-13] - HTTP request completed  {Status Code=200 ,Time taken=48 ms}

Feb 17, 2016 14:17:55 PM Information [ajp-bio-8012-exec-5] - Starting HTTP request {URL=xxxxxxxxxxxxxxxxxxxxxxx, method='get'}

Feb 17, 2016 14:17:55 PM Information [ajp-bio-8012-exec-5] - HTTP request completed  {Status Code=200 ,Time taken=64 ms}

Feb 17, 2016 14:18:10 PM Information [ajp-bio-8012-exec-14] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:18:11 PM Information [ajp-bio-8012-exec-10] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx.

Feb 17, 2016 14:18:11 PM Information [ajp-bio-8012-exec-12] - Starting HTTP request {URL=xxxxxxxxxxxxxxxxxxxxxxx4903204.json', method='get'}

Feb 17, 2016 14:18:11 PM Information [ajp-bio-8012-exec-12] - HTTP request completed  {Status Code=200 ,Time taken=32 ms}

Feb 17, 2016 14:19:14 PM Information [ajp-bio-8012-exec-7] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:20:03 PM Error [ajp-bio-8012-exec-7] - POST parameters exceeds the maximum limit 100 specified in the server. You can modify the setting in Administrator Server Settings.

Feb 17, 2016 14:20:16 PM Information [ajp-bio-8012-exec-9] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:20:20 PM Information [ajp-bio-8012-exec-14] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:20:23 PM Information [ajp-bio-8012-exec-4] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:20:56 PM Information [ajp-bio-8012-exec-7] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:20:56 PM Information [ajp-bio-8012-exec-6] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx.

Feb 17, 2016 14:20:57 PM Information [ajp-bio-8012-exec-2] - Starting HTTP request {URL=xxxxxxxxxxxxxxxxxxxxxxx4927102.json', method='get'}

Feb 17, 2016 14:20:57 PM Information [ajp-bio-8012-exec-2] - HTTP request completed  {Status Code=200 ,Time taken=38 ms}

Feb 17, 2016 14:21:05 PM Information [ajp-bio-8012-exec-12] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:21:05 PM Information [ajp-bio-8012-exec-1] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx.

Feb 17, 2016 14:21:05 PM Information [ajp-bio-8012-exec-4] - Starting HTTP request {URL=xxxxxxxxxxxxxxxxxxxxxxx4930797.json', method='get'}

Feb 17, 2016 14:21:06 PM Information [ajp-bio-8012-exec-4] - HTTP request completed  {Status Code=200 ,Time taken=35 ms}

Feb 17, 2016 14:22:34 PM Information [ajp-bio-8012-exec-2] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:23:25 PM Information [ajp-bio-8012-exec-11] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

Feb 17, 2016 14:27:46 PM Information [scheduler-2] - Alert: Unresponsive server state detected. 10 or more threads have been busy for over 180000 milliseconds

Feb 17, 2016 14:27:46 PM Information [scheduler-2] - Alert: unresponsiveserveralert: Email notification sent.

We see a few HTTP requests succeed and followed by an unresponsive server alert. This didn't make any sense to us what so ever


What we already tried doing ---------

  • Updated Tomcat connector settings on Server.xml and worker.properties. We have different timeout settings on these two files for our custom environment.
  • Disabled global client variables
  • Updated connection pool settings
  • Modified CFINVOKE calls to use CFHTTP


We are running out of options and are hitting a brick wall with no new ideas. I know we are missing a piece in the puzzle, but not sure where it is.... (Apache settings, Tomcat settings ???)


Any help on this issue would be greatly appreciated. Please let me know if you need any additional information.



Thank you in advance,

Sam.

    This topic has been closed for replies.
    Correct answer Sam_BCH

    Hi Sam,

    I am unsure, is OS, Apache and CF10  64 or 32 bit?

    How much RAM is available?

    Do you have good reasons to be running Java 1.6?

    More questions than answers for now. Lets hope to turn that around soon.

    Regards, Carl.


    Hi Carl and folks who helped me with this.

    I think we cornered and resolved the issue here. All we had to do was to reduce the interval to 2 minutes instead of 7 mins (which is default) to check for connections sitting idle for 5 mins (again, instead of 20 mins which was default) on datasource advanced settings. It wasn't clear on the documentation but we tried all the settings before posting to the forums except for this one.

    We hope the issue is resolved, we haven't had  a crash since 4 days now....still fingers crossed, though

    Thanks again everyone.

    Best,
    Sam.

    1 reply

    BKBK
    Community Expert
    Community Expert
    February 27, 2016

    You say there is no particular pattern but what about thie following pattern, with groups per thread ID:

    Feb 17, 2016 14:17:52 PM Information [ajp-bio-8012-exec-12] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

    Feb 17, 2016 14:21:05 PM Information [ajp-bio-8012-exec-12] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

    Feb 17, 2016 14:17:52 PM Information [ajp-bio-8012-exec-6] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx.

    Feb 17, 2016 14:20:56 PM Information [ajp-bio-8012-exec-6] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx.

    Feb 17, 2016 14:18:10 PM Information [ajp-bio-8012-exec-14] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

    Feb 17, 2016 14:20:20 PM Information [ajp-bio-8012-exec-14] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

    Feb 17, 2016 14:19:14 PM Information [ajp-bio-8012-exec-7] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

    Feb 17, 2016 14:20:03 PM Error [ajp-bio-8012-exec-7] - POST parameters exceeds the maximum limit 100 specified in the server. You can modify the setting in Administrator Server Settings.

    Feb 17, 2016 14:20:56 PM Information [ajp-bio-8012-exec-7] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

    Feb 17, 2016 14:20:23 PM Information [ajp-bio-8012-exec-4] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

    Feb 17, 2016 14:21:05 PM Information [ajp-bio-8012-exec-4] - Starting HTTP request {URL=xxxxxxxxxxxxxxxxxxxxxxx4930797.json', method='get'}

    Feb 17, 2016 14:21:06 PM Information [ajp-bio-8012-exec-4] - HTTP request completed  {Status Code=200 ,Time taken=35 ms}

    Feb 17, 2016 14:20:57 PM Information [ajp-bio-8012-exec-2] - Starting HTTP request {URL=xxxxxxxxxxxxxxxxxxxxxxx4927102.json', method='get'}

    Feb 17, 2016 14:20:57 PM Information [ajp-bio-8012-exec-2] - HTTP request completed  {Status Code=200 ,Time taken=38 ms}

    Feb 17, 2016 14:22:34 PM Information [ajp-bio-8012-exec-2] - Using Axis 1 for consuming the service xxxxxxxxxxxxxxxxxxxxxxx

    Feb 17, 2016 14:27:46 PM Information [scheduler-2] - Alert: Unresponsive server state detected. 10 or more threads have been busy for over 180000 milliseconds

    The consumption of the services seems to take too long. Find out why.

    It might help to make the following changes in the ColdFusion Administrator.

    On the page Server Settings > Settings:

    Timeout Requests after seconds: 600

    Maximum number of POST request parameters: 500

    On the page Server Settings > Request Tuning:

    Timeout requests waiting in queue after 600 seconds

    Sam_BCH
    Sam_BCHAuthor
    Inspiring
    February 29, 2016

    Hi BKBK, Thank you for the reply.

    The scenario that i mentioned was one of the cases from many. We also started to see a strange 'BLOCKED' error/issue on JDBC connection pool checkOut. Following is the snippet with "cfthread" but we are also seeing the same with ajp threads as well.

    • "cfthread-15" prio=5 tid=1268 BLOCKED

         at coldfusion.server.j2ee.sql.pool.JDBCPool.checkOut(JDBCPool.java:371)

         at coldfusion.server.j2ee.sql.pool.JDBCPool.requestConnection(JDBCPool.java:825)

         at coldfusion.server.j2ee.sql.pool.JDBCManager.requestConnection(JDBCManager.java:125)

         at coldfusion.server.j2ee.sql.JRunDataSource.getConnection(JRunDataSource.java:135)

         at coldfusion.sql.CFDataSource.getConnection(CFDataSource.java:42)

         at coldfusion.sql.DataSrcImpl.getCachedConnection(DataSrcImpl.java:156)

         at coldfusion.sql.DataSrcImpl.getConnection(DataSrcImpl.java:110)

         at coldfusion.sql.SqlImpl.execute(SqlImpl.java:363)

         at coldfusion.tagext.sql.QueryTag.executeQuery(QueryTag.java:1059)

         at coldfusion.tagext.sql.QueryTag.doEndTag(QueryTag.java:688)

    • "ajp-bio-8012-exec-1" prio=5 tid=191 BLOCKED

         at coldfusion.server.j2ee.pool.ObjectPool.checkIn(ObjectPool.java:263)

    We spoke to our DBA team to see if there is any blocked connection on their end at the time of crash, but they couldn't find anything in their logs. Above snippet was taken from the snapshot right after CF crashed. We were able to login to CF admin, though. We tried to make changes on JVM arguments but that didn't help.

    -server -XX:NewRatio=4 -XX:SurvivorRatio=8 -XX:+UseCompressedOops -Xss256k -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=68 -XX:MaxPermSize=768m -Xbatch -Dcoldfusion.home={application.home} -Djava.security.egd=/dev/urandom -Dcoldfusion.rootDir={application.home} -Dcoldfusion.libPath={application.home}/lib -Dorg.apache.coyote.USE_CUSTOM_STATUS_MSG_IN_HEADER=true -Dcoldfusion.jsafe.defaultalgo=FIPS186Random -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -verbose:gc -Xloggc:cfwhispererGC.log

    I am willing to make the changes you suggested, but wanted to investigate what is causing the blocked connection.

    Thanks again for your effort in helping me solve the puzzle.

    -Sam.

    Legend
    February 29, 2016

    Hi Sam,

    Not my environment being CF on Windows IIS however case it helps some thoughts.

    What I notice:

    -CF10 so Java 7

    -low pause CMS JVM settings curtesy Mike Brunt (thanks to Mike as usual to his helpful blog posts)

    -errors warning in logs occur on lines with ajp-bio-8012

    Questions:

    -what is JVM settings for minimum and maximum heap size or Xms Xmx of jvm.config-

    -you applied tomcat tuning for timeout have you defined increases for tomcat pools and threads

    (connection_pool_size in properties, maxThreads in server.xml)

    -do you have a copy of cfwhispererGC.log when system was failing

    (note restarting CF will start the cfwhispererGC.log contents over again)

    HTH, Carl.