Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

ColdFusion 9 Fails after SQL 2008 Cluster Failover

New Here ,
Apr 20, 2010 Apr 20, 2010

All,

I have a ColdFusion 9 app server that seems to be failing when my SQL 2008 cluster fails over to the passive node. By failing, my IIS server does not respond to any .cfm requests about 4-5 minutes after the failover. It serves out static and .asp pages just fine. Restarting ColdFusion server service and rebooting the server itself does not fix the problem. The only fix is to fail the SQL cluster back to the original node.

Specs are as follows:

SQL - SQL 2008 SP1 64-bit active/passive cluster VM

CF - CF9 running on Windows 2008 SP2 64-bit VM

Erick

1.9K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Apr 20, 2010 Apr 20, 2010

Can you show the settings summary for your datasource in the CF Administrator?

On which layer does your cluster fail over? Does the passive node just take over the IP address of the active node or does it take over the MAC address as well?

For faster failover you should always shorten the TcpTimedWaitDelay in your Windows registry.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 20, 2010 Apr 20, 2010

I don't have access right now to get a screenshot of one of our datasources. I do know that we have the "Maintain connection" option enabled for all of our SQL datasources (we have more than one defined). I believe everything else may be the default setting.

As for the failover, the passive node takes over the IP and MAC address.

Erick

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Apr 20, 2010 Apr 20, 2010

I expect failover to improve if you reduce the TcpTimedWaitDelay, the default value of 4 minutes is suspiciously like the initial delay you are seeing. Then the next step would be to make sure you have a validation query defined on the datasource to make sure CF actually takes old connections out of the connection pool.

You have check this is actually a CF problem and other tools on the same machine that Cf uis running can still connect to the datasource after a failover, right?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Apr 27, 2010 Apr 27, 2010

Since it is a Windows cluster, the VIP address of the SQL cluster is going to move from one machine to the other.  One thing that could be delaying restart is the network isn't being updated fast enough.

I'd fire up some pings and fault the cluster and see how fast the network is responding to the IP address moving.

Is your CF server on the same subnet as your SQL cluster?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 28, 2010 Apr 28, 2010

The issue isn't about ColdFusion having a problem in regards to the SQL cluster failover process itself. It works fine immediately after the failover meaning ColdFusion templates respond normally and ping time from the the web application server to SQL is <1ms. It's ColdFusion and/or IIS7 that stops serving .cfm templates ~4-6 minutes after the failover. I can fetch all non-cfm files (like .txt .asp .htm) just fine when the issue crops up. So this tells me IIS is probably handing over .cfm requests to the jrun process ok. Unfortunately, I can't view or verify ColdFusion datasources in CFAdmin because no .cfm pages work.. a lovely catch-22 scenario.

Yes, both servers are on the same subnet.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 09, 2010 Sep 09, 2010
LATEST

Problem solved. We had a linked server defined in SQL that pointed to an Oracle instance. We could successfully run the "Test Connection" in SQL Management Studio for the linked server when our SQL 2008 cluster was on node 1. However, we got an "OraOLEDBpus10.dll:  The specified module could not be found" error message when trying to test the connection when the cluster was on node 2. The fix was to add the Oracle folder path to the Windows PATH environment variable on node 2. The Oracle installer is supposed to do this automatically, so we don't know why it was missing.

The reason CF would become unresponsive was that we had a template with a query that used the linked server. The purpose of this template was to monitor the status of the linked server. Using CF Server Monitor, we were able to see active requests to that template which would keep running and not terminate even though we set a timeout value in CF Admin.

Erick

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources