Copy link to clipboard
Copied
Folks,
I've been dealing with a problem over the past month or so with one of my developers where his application is crashing my IIS server's w3wp.exe.
Essentially, the application pool keeps crashing and restarting until it hits the limit of crashes within the timeframe allowed, and then stays down - resulting in a 503 error - website unavailable.
According to the crash dumps that I've been able to collect - the faulting module in this case is MSVCR110.dll - and it's crashing because it's trying to read from memory that it doesn't have access to.
Fault bucket 1896345861034215315, type 4
Event Name: APPCRASH
Response: Not available
Cab Id: 0
Problem signature:
P1: w3wp.exe
P2: 10.0.14393.0
P3: 57899b8a
P4: MSVCR110.dll
P5: 11.0.51106.1
P6: 5098826e
P7: c0000005
P8: 0000000000051d70
P9:
P10:
Faulting application name: w3wp.exe, version: 10.0.14393.0, time stamp: 0x57899b8a
Faulting module name: MSVCR110.dll, version: 11.0.51106.1, time stamp: 0x5098826e
Exception code: 0xc0000005
Fault offset: 0x0000000000051d70
Faulting process id: 0x331c
Faulting application start time: 0x01d6933ff834baae
Faulting application path: c:\windows\system32\inetsrv\w3wp.exe
Faulting module path: C:\Windows\SYSTEM32\MSVCR110.dll
Report Id: e41b3d3a-9f47-4a72-94e0-5696029da3a3
Faulting package full name:
Faulting package-relative application ID:
I don't think we're looking at a problem with the isapi_connector.dll being out of date, as it has the date on it that I applied the Update 16 fixes, and I definitely upgraded the connector in the wsconfig tool.
The developer, in this case is using (I believe) Wheels for his application framework (don't know if this really matters or not).
I don't think that this is related to the AJP connector changes in Tomcat, as the website loads normally when starting things up and our other instances running on the same server have not experienced this problem, but looking in the isapi_redirect log I do see a LOT of calls to jk_ajp_common.c around the times when the AppPool crashes - so, admittedly, I could be wrong there. I have gone ahead and, just in case, added allowedRequestAttributesPattern to the server.xml for that instances and set it to ".*", again - just in case.
I'm pretty stumped at this point. ANY ideas?
Thanks!
Copy link to clipboard
Copied
On the possibility that the issue was with a bad Visual C++ Redistributable, I did, also, uninstall and reinstall the 2012 x86 and x64 packages for it - so that we were at the most up to date version - no dice.
Copy link to clipboard
Copied
What is your Windows server version? Have you applied the latest updates to it? I ask because I think this is a Windows/IIS issue.
Hence my suggestion is along the lines of your remark about Visual C++ Redistributable.
1) Stop the ColdFusion service temporarily.
2) Apply the latest updates for your Windows server.
3) MSVCR110 represents Visual C++ 2010 SP1 Redistributable, not 2012, the version you reinstalled. So reinstall Visual C++ 2010 SP1 Redistributable, just in case your version has been corrupted.
Reinstall both the x86 and x64 versions.
In fact, while you're at it, you might as well also install or reinstall
Microsoft Visual C++ 2012 Update 4 Redistributable (x86 and x64),
Microsoft Visual C++ 2013 Redistributable (x86 and x64)
and
Microsoft Visual C++ 2015 Update 3 Redistributable (x86 and x64)
For more information and download links, see the post by "Andre for Directly" at
4) Restart ColdFusion.
5) Configure the web server connectors.
Copy link to clipboard
Copied
I update my Windows Servers every month the week after Patch Tuesday.
Ther server in question is running Windows Server 2016
I have reinstalled them, and there was no effect. I did go ahead and install the 2010 redistributable as well - that put the msvcr100.dll in to the Windows, system folders.
I don't mean to question your knowledge - but when I right click and look at the details on msvcr110.dll it says that it's part of Visual C++ 2012.
I didn't even have Visual C++ Redistributable 2010 installed on the server, and every time I search on that DLL - it comes back to the 2012 redistributable.
Deleting and recreating the wsconfig folder for the affected site has not had an effect on things either.
Copy link to clipboard
Copied
mguenther1272, while you are replying here today to BKBK's suggestions, your last paragraph seems to be responding to my point 3 below from yesterday. What about my point 4?
Copy link to clipboard
Copied
No problem. I stand corrected: msvcr110.dll is part of Visual C++ 2012, not 2010.
When you call it my knowledge, you give me more credit than I deserve. I simply googled to verify the version-year. I found two: 2010 and 2012. I gambled on 2010 as it is older.
Anyway, what's more important is that you've installed all the Visual C++ Redistributables. This serves a purpose. We can now rule out Visual C++ Redistributable, and look further. Which I am now doing.
Copy link to clipboard
Copied
I continue to think this is a conection problem. On that basis, here's another set of suggestions:
1) Remove the attribute allowedRequestAttributesPattern from the server.xml file. After all, it hasn't helped. Leaving it there might have an adverse effect elsewhere.
2) You have perhaps stumbled on an issue that the Adobe ColdFusion Team is looking into. Request the latest CF2016 isapi_redirect.dll from them (cfinstal[at]adobe.com). Attach the instance's isapi_redirect.log in your request, and refer the team to this forum discussion.
Copy link to clipboard
Copied
Yet another suggestion. Go to the page https://helpx.adobe.com/coldfusion/kb/coldfusion-2016-update-14.html , scroll down to the "Troubleshooting" section and apply the suggestions for 503 errors.
Copy link to clipboard
Copied
That's an interesting (and from my experience, unusual) problem.
Given the log entries, it does seem right to still suspect the isapi_connector.dll. I know you said you don't think it's "out of date, as it has the date on it that I applied the Update 16 fixes", but to be clear, that's not the date it should have.
It should be a date relative to the update itself--when Adobe created it. So for update 16, it should be from July 2020. Is it? If not, do upgrade it, and check the date, then try things out.
If that's not it, it may do to remove and recreate the connector (for that site), or even to create a new site (and app pool) in iis and create a new connector for that, just in case some combination of unexpected config factors are at work. I'd hold that as a last resort.
Do keep us posted.
Copy link to clipboard
Copied
My bad - I mis-typed. The isapi_connecter.dll is dated July 2020. I probably should've also said that I did tear down and rebuild both the IIS site and the ColdFusion instance - with no improvements.
Although, I don't remember if I went through and deleted everything - does deleting the CF instance from the CFADMIN page delete the corresponding wsconfig directory? I know it deleted the instance from my CF2016 directory - I just don't remember if I cleaned up the wsconfig.
I could always play with the memory settings, maybe double up the available memory for the erroring instance?
Copy link to clipboard
Copied
Well, now that you seem to be saying you have multiple instances of cf, that is a new wrinkle.
About removing one (in the CF instance mgr), it has zero effect on the connector/wsconfig. In fact, that could be a possible source of trouble, if troubleshooting attempts/tweaks may be making the connector config convoluted.
So yes, I'd propose you remove the connector related to the site with the error. Do this in 4 steps :
Any remnants on either side could lead to trouble.
THEN add the site back with the wsconfig tool, and test again.
If you continue to struggle, you need not go it alone. I could help remotely in perhaps as little as 15 mins. See carehart.org/consulting.
Finally, fwiw, I can't imagine any connection at all between cf heap size and such dll/w3wp.exe crashes. They each run in their own processes.
Copy link to clipboard
Copied
I took a day to work on other projects, and came back to this this morning.
So - things I've tried on my recent attempt to clean up and recreate things.
Completely deleted the CF Instance
Completely deleted the IIS Site
Recreated the IIS Site
Recreated the CF Instance
Re-ran the Web Server Config tool.
Still 503's. I maybe should clarify that it doesn't 503 immediately. The web page does load initially and some interaction is possible. I can't seem to pin down exactly when the 503 occurs.
Another attempt:
I turned off the IIS Site.
I created a new IIS Site
I ran the Web Server Config Tool and removed the connector for the original site.
I then used it to create a new connector for the new site (resulting in a new /wsconfig/<magic.number> folder)
I then deployed a very simple test.cfm page that is designed to return a simple error message.
The new site almost immediately 503'd before displaying the expected CF error message. Regular non-cfm resources load with no problem (in this case I just use a .jpg to test basic site functionality).
I've got process creation auditing turned on and the process that errored out was created with the following:
c:\windows\system32\inetsrv\w3wp.exe -ap "IISSITENAME" -a \\.\pipe\iisipm97ab0c04-38d8-43e4-9402-0f0ecd6dd3ef -h "C:\inetpub\temp\apppools\AppPoolName\AppPoolName.config" -w "" -m 0 -t 20 -ta 0
Windows Error Reporting ran almost instantly with the following:
C:\Windows\system32\WerFault.exe -u -p 2452 -s 1352
All the debugging reports show similar information - memory access violation with the faulting module as msvcr110.dll
I guess the next thing to try is to do what BKBK suggests and see if we can get a new isapi.dll from Adobe.
Copy link to clipboard
Copied
Grrr.. I also forgot to mention - that none of the other instances on the server seems to be behaving in this manner. It's just this instance. Do I have to many running on the server?
Copy link to clipboard
Copied
Bummer indeed, and to your previous comment first, Adobe may well have some new DLL that somehow addresses this problem, but in case they do not, you still need to solve this.
And good on you for some of the steps you did. They're still not exactly what I proposed, but I won't flog that horse. Moving on...
When you say you get a 503, are you saying that's ALWAYS about the error with the w3wp that you reported initially? (One can get 503s for other reasons, including from CF code, so I just want to be clear.)
And when you say that all errors show "the faulting module as msvcr110.dll", I'm assuming you still mean the one you mentioned originally, being in C:\Windows\SYSTEM32\MSVCR110.dll (as opposed to any you may find in CF, because while I find an msvcr120.dll in CF, and a msvcr100.dll in the slserver54 folders of CF, I don't find any 110--so I hope you do not either.)
So what's left? Well, it's certainly compelling that the problem happens only with one instance and not others. The natural question is what is different between them. You may reasonably presume "nothing", or you may say "they were all created from the same cfusion instance", but clearly something IS different.
Out of curiosity, if you create yet another instance, and another site, and use the wsconfig tool to connect those, what happens?
BTW, when you create the new IIS site, are you being careful not to point the new site to the webroot of the old one? If you do that, you will inherit whatever web.config file entries are in the webroot that they share. Far better (for the same of diagnosing this problem) to create a new site with its own root. Heck, you may even find that if you do that and connect it to the current "failing" instance, things may suddenly work.
But if not, do try a new instance. I propose that because if the NEW instance also fails, and if the failing one was the last instance you created before that, then we can more reasonably presume there's some difference between them and the others. (And since new CF instances take on the settings of the cfusion instance, perhaps someone changed something THERE before these new instances were created.)
Bottom line, there HAS to be an explanation. And since the problem happens immediately, it would not seem to be about load. And since it happens with a newly created connector, using the same DLL, I would think it's not about the connector.
I'll say again that if you don't want to wait for (and wade through) lots of back and forth, we may find the problem and solve it more readily in a screenshare session together. I have helped many people solve such knotty problems, often quite quickly. There's just no substitute for following different lines of evidence interactively, going from one to another, depending on what we see. (If there was a simple sequential checklist of things, it would exist.) Finally, Adobe may even offer to do a screenshare with you for free.
Either way, I hope you'll let us know how things turn out.
Copy link to clipboard
Copied
c:\windows\system32\inetsrv\w3wp.exe -ap "IISSITENAME" -a \\.\pipe\iisipm97ab0c04-38d8-43e4-9402-0f0ecd6dd3ef -h "C:\inetpub\temp\apppools\AppPoolName\AppPoolName.config" -w "" -m 0 -t 20 -ta 0
This error-message contains what to me is a bright red flag. I expected to see the name of a custom site - something like mywebdomain.com - in place of "AppPoolName".
Copy link to clipboard
Copied
I obfuscated the actual name of the Application Pool - something I got used to doing in order to avoid releasing sensitive information. Probably wasn't necessary in this case - but a habit I like to keep up.
Copy link to clipboard
Copied
I'll follow up a bit later today with more information. Haven't had a lot of time to put into this over the past week.
Copy link to clipboard
Copied
Ah, OK.
Copy link to clipboard
Copied
I have one new (and potentially very interesting?) datapoint to add.
I deployed this application to our production environment (which we HAVE NOT updated to Update 16 - it's still running Update 15). And everything works hunky dory so far. The end users are going to hit it later today to see if they can generate some 503's, but we hit it and never generated one.
As far as what address tomcat is binding to - according to the logs it's going to the IPv4 localhost - as I would expect - it's not trying the IPv6 route - so I don't think I have to add the address tag to the server.xml on the test environment side where all the 503s are occuring.
At this point - I'm leaning towards uninstalling Update 16 and reverting back to Update 15.
Copy link to clipboard
Copied
Fair enough. But why don't we give this one last try before you revert.
Could you share with the forum - for the faulty instance - the contents of the 2 files:
/wsconfig/<magic.number>/workers.properties
/cfusion/runtime/conf/server.xml
together with all the warnings and errors in
/wsconfig/<magic.number>/isapi_redirect.log
Copy link to clipboard
Copied
Just a heads up, before asking folks to share those files (or before sharing them when asked), beware that since the March updates to CF (Tomcat under the covers) and the wsconfig, there is now a "secret" recorded in those two files (workers.properties and server.xml), so one should not share that secret publicly.
This "secret" is just a string that is passed between the two (configured by the CF/Tomcat update and wsconfig upgrade done after that) as a way to verify that they are "allowed to talk to each other". If one could know that secret from the outside (and assuming the AJP port in CF/Tomcat were somehow "open"), then someone could "talk to that AJP port from the outside".
I will agree 100% that for most folks, this AJP port (8018, by default, in CF2018) will NOT be "open to the world", because most firewalls would block it, being a non-standard port. But the whole reason for that "fix" to Tomcat earlier this year was a fear over that port being "open" and abused potentially (the Tomcat "ghostcat" vulnerability, which fix Adobe incoporated in that March update to CF and the wsconfig) . So again, I'm just saying we should not casually "share that secret". 🙂
And if someone DID share it and then wish to change it, they could literally edit both files--and any workers.properties for any other connectors they may not have shared), changing the key string by a few chars, and then restart CF and the web server. All should be fine. As always, best to save a copy of the files before changing them.
All that said, and as for mguenther1272's ongoing issue, it's a curious one. I don't expect much to come from looking at those files, but BKBK is game so I'm not discouraging it. Only warning about the sharing of that secret. 🙂
Copy link to clipboard
Copied
You have a point there about the secret, Charlie_Arehart.
mguenther1272, you can share the information by means of a private message.
Copy link to clipboard
Copied
> All the debugging reports show similar information - memory access
> violation with the faulting module as msvcr110.dll
I would then experiment by temporarily Visual C++ 2012
Copy link to clipboard
Copied
> The new site almost immediately 503'd
As I said earlier, "Yet another suggestion. Go to the page https://helpx.adobe.com/coldfusion/kb/coldfusion-2016-update-14.html , scroll down to the "Troubleshooting" section and apply the suggestions for 503 errors."
Copy link to clipboard
Copied
To BKBK's point about 503's, while I wouldn't have thought this 503 (for CF and Tomcat, due to that June CF/Tomcat update) would result in the reference to the msvcr110.dll you mentioned, I suppose it could if you dug deep enough.
So to be clear, what he (and that page) refers to is the need (in some situations) to add an "address" attribute to the server.xml file for CF (Tomcat). Given that you said things worked for other instances and not this one, I'd have thought that if this new "address" was needed for one of your sites/instances, it would be needed for all. But maybe not. This is again why I pressed to know about the differences between them.
We shall see what you find and report next. 🙂 And yep, to BKBK's next point about the potential complexity of this problem.