Copy link to clipboard
Copied
I have found that on two of my coldfusion 2021 servers, at random intervals, the CGI scope starts returning an empty struct. When this happens, the only solution seems to be a restart of the ColdFusion service.
there dont seem to be any relevant entries in any of the ColdFusion logs or the Windows Event Viewer.
Has anybody ever experienced this?
Hi BKBK and Charlie,
First I would like to thank you both for your help. I believe we found the root cause and it WAS code. There was a function to scrub passwords from structs so that we could safely create error handling emails with dumps of form, request and CGI and not have the password displayed. Below is from the developer:
I think this was a subtle difference between CF2016 and CF2021 in the way it handles copying structs inside a function. Let me explain a little further…
Looking a
...Copy link to clipboard
Copied
Which web server do you use? Any clues from the web server log and connector log?
Copy link to clipboard
Copied
A quick search on the web produced this: https://hostmedia.uk/client/knowledgebase/201204236/Missing-CGI-variables.html
Does it help?
Copy link to clipboard
Copied
we are using IIS on windows 2019. Does the mod_jk.conf file exist for IIS servers as well?
Copy link to clipboard
Copied
That link is referring particularly to an Apache web server. For IIS, the relevant files are:
isapi_redirect.properties
workers.properties
uriworkermap.properties
Could you share these? Also, are there any related errors in isapi_redirect.log?
Copy link to clipboard
Copied
Thank you!
looking in the isapi_redirect.log I see a handfull of errors like this:
[Mon Jan 03 12:54:21.127 2022] [8216:11024] [error] start_response::jk_isapi_plugin.c (1293): HSE_REQ_SEND_RESPONSE_HEADER failed with error=87 (0x00000057)
[Mon Jan 03 12:54:21.128 2022] [8216:11024] [error] isapi_write_client::jk_isapi_plugin.c (1480): WriteClient failed with 1229 (0x000004cd)
[Mon Jan 03 15:05:00.560 2022] [8216:7472] [error] start_response::jk_isapi_plugin.c (1293): HSE_REQ_SEND_RESPONSE_HEADER failed with error=87 (0x00000057)
[Mon Jan 03 15:05:00.561 2022] [8216:7472] [error] isapi_write_client::jk_isapi_plugin.c (1480): WriteClient failed with 1229 (0x000004cd)
[Mon Jan 03 16:53:26.891 2022] [8216:11384] [error] start_response::jk_isapi_plugin.c (1293): HSE_REQ_SEND_RESPONSE_HEADER failed with error=87 (0x00000057)
[Mon Jan 03 16:53:26.892 2022] [8216:11384] [error] isapi_write_client::jk_isapi_plugin.c (1480): WriteClient failed with 1229 (0x000004cd)
[Mon Jan 03 16:53:26.940 2022] [8216:12784] [error] start_response::jk_isapi_plugin.c (1293): HSE_REQ_SEND_RESPONSE_HEADER failed with error=87 (0x00000057)
[Mon Jan 03 16:53:26.941 2022] [8216:12784] [error] isapi_write_client::jk_isapi_plugin.c (1480): WriteClient failed with 1229 (0x000004cd)
[Mon Jan 03 17:06:58.430 2022] [8216:11384] [error] start_response::jk_isapi_plugin.c (1293): HSE_REQ_SEND_RESPONSE_HEADER failed with error=87 (0x00000057)
[Mon Jan 03 17:06:58.431 2022] [8216:11384] [error] isapi_write_client::jk_isapi_plugin.c (1480): WriteClient failed with 1229 (0x000004cd)
[Tue Jan 04 00:09:26.788 2022] [8216:13296] [error] start_response::jk_isapi_plugin.c (1293): HSE_REQ_SEND_RESPONSE_HEADER failed with error=87 (0x00000057)
[Tue Jan 04 00:09:26.789 2022] [8216:13296] [error] isapi_write_client::jk_isapi_plugin.c (1480): WriteClient failed with 1229 (0x000004cd)
and below are the contents of each of the files you mentioned
isapi_redirect.properties:
extension_uri= /jakarta/isapi_redirect.dll
log_file= D:\CF_Path\config\wsconfig\1\isapi_redirect.log
log_level= info
worker_file= D:\CF_Path\config\wsconfig\1\workers.properties
worker_mount_file= D:\CF_Path\config\wsconfig\1\uriworkermap.properties
iis_buffer_enable= true
auth_complete= 1
iis_skip_custom_errors_enable= false
workers.properties:
worker.list=cfusion
heartbeat_interval=30
heartbeat_limit=90
worker.cfusion.type=ajp13
worker.cfusion.host=localhost
worker.cfusion.port=8020
worker.cfusion.connection_pool_size=500
worker.cfusion.connection_pool_timeout=60
worker.cfusion.max_reuse_connections=250
worker.cfusion.monitoringsecret=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
worker.cfusion.secret=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
uriworkermap.properties:
/cfformgateway/* = cfusion
/CFFormGateway/* = cfusion
/flex2gateway/* = cfusion
/flex2gateway = cfusion
/cffileservlet/* = cfusion
/CFFileServlet/* = cfusion
/cfform-internal/* = cfusion
/flashservices/gateway/* = cfusion
/flex-internal/* = cfusion
/rest/* = cfusion
/restapps/* = cfusion
/mcs/* = cfusion
/mcs = cfusion
/__cf_connector_heartbeat__ = cfusion
/cfapiresources/* = cfusion
/*.mxml = cfusion
/*.as = cfusion
/*.cfm = cfusion
/*.CFM = cfusion
/*.Cfm = cfusion
/*.cfm/* = cfusion
/*.CFM/* = cfusion
/*.Cfm/* = cfusion
/*.swc = cfusion
/*.cfml = cfusion
/*.CFML = cfusion
/*.Cfml = cfusion
/*.cfml/* = cfusion
/*.CFML/* = cfusion
/*.Cfml/* = cfusion
/*.cfc = cfusion
/*.CFC = cfusion
/*.Cfc = cfusion
/*.cfc/* = cfusion
/*.CFC/* = cfusion
/*.Cfc/* = cfusion
/*.cfr = cfusion
/*.CFR = cfusion
/*.Cfr = cfusion
/*.cfswf = cfusion
/*.CFSWF = cfusion
/*.Cfswf = cfusion
/*.sws = cfusion
/*.jsp = cfusion
/*.hbmxml = cfusion
!/CFIDE* = cfusion
Copy link to clipboard
Copied
Hi @Blair22505870i4ma , thanks for sharing the 3 configuration files. Their content looks all right. As far as I can see, we can rule out isapi_redirect.properties, workers.properties and uriworkermap.properties as the cause of the problem.
The errors in isapi_redirect.log are interesting. They are recurrent and invariably occur as a pair. They may or may not be related to your issue, but are worth a look anyway.
They are:
The first tells me your application might be using an invalid response header. In any case, a response header that Tomcat considers invalid.
The second tells me there was an attempt to write to a nonexistent or closed connection. An explanation I found on the web says this could happen
"...when the connection between the client and IIS can no longer be used to write back the response. This could either mean your response generation is very slow, so that users have already reloaded
the page or changed to another page, or you have a network stability problem."
[ https://www.mail-archive.com/dev@tomcat.apache.org/msg90955.html ]
From the above considerations my suggestions would then be to:
worker.cfusion.connection_pool_timeout=120
<Connector packetSize="65535" protocol="AJP/1.3" port="8020" redirectPort="8453" secret="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" maxThreads="500" connectionTimeout="120000" tomcatAuthentication="false"/>
Copy link to clipboard
Copied
Hi BKBK,
In the servers in question, we actually have three total servers. One of them has NOT experienced any issues with CGI scope, while the other two have been affected.
Compating the Response headers in IIS, I found that the two affected servers have one additional response header, which the 3rd server does not have.
X-Powered-By=ASP.NET
I am not sure if how how that woudl affect the CGI scope, but I will try removing this response header and see what happens.
I will also try bumping up the connectionTimeout from 60-120 and see if that helps. Is that something that Has to be edited directly in the two files referenced or can it be modified inside the CF admin?
Copy link to clipboard
Copied
Thanks for the update.
I will also try bumping up the connectionTimeout from 60-120 and see if that helps. Is that something that Has to be edited directly in the two files referenced or can it be modified inside the CF admin?
By @Blair22505870i4ma
In the 2 files, using the settings I mentioned.
Then restart ColdFusion for the change to take effect.
Copy link to clipboard
Copied
X-Powered-By=ASP.NET
I am not sure if how how that woudl affect the CGI scope, but I will try removing this response header and see what happens.
By @Blair22505870i4ma
You might be interested in Pete Freitag's post on removing the header X-Powered-By=ASP.NET: https://www.petefreitag.com/item/722.cfm
Copy link to clipboard
Copied
Blair, I'd take things in a different direction than the other discussion about the web server configuration files.
Assuming that doesn't get you to a solution, let's go back to your first point. You say that "at random intervals, the CGI scope starts returning an empty struct". Tell us more:
As for further diagnosis:
Finally, to answer your question, I have not seen this before myself. It sounds interesting, but as you can see from the above, I am used to being dropped in "behind the fire line" to help put out the fire, even if I've "never been in that forest before". 🙂 I hope the thoughts may be helpful, again if that other line of discussion doesn't pan out.
Copy link to clipboard
Copied
Hi Charlie - Thank you for your response!
as you can see I am still trying a few things that BKBK suggested. One other thing to note is that previously, we had removed the values from "Minimum JVM Heap Size" and "Maximum JVM Heap size" in the Java and JVM settings screen in the CF Administrator in an attempt to have CF/Java automatically manage this.
A few days ago, we reverted this change, adding values back into those fields in the CF Admin. We have not experienced any issues since then but its still possible that is a coincidence.
With that said - let me answer a few of your questions to help give a more complete picture.
1. as I mentioned in my most recent reply to BKBK, we actually have 3 CF servers in this environment. Two of them are load balanced and are experiencing the issues with the CGI scop. The third one is NOT load balanced and is not having any issues with the CGI scope.
2. These servers are used primarily for a third party commercial CMS product based on ColdFusion. Wh the issue surfaces, we find that certain elements on that CMS, start to render incorrectly and throw errors relatign to missing variables which the CMS shoudl always have available. After speakign with the CMS vendor, they told us that those variables are pulled from the CGI scope, and when the issue was happening, we found that if we were to dump the CGI scope in a test file, it woudl dump an empty struct.
3. when I say it is at random intervals, what I mean is that the issue keeps happening - sometimes on one of the two load balanced servers and sometimes on both. Sometimes it happens after a few days, and sometimes it happens after several hours.
4. when this issue does happen, it stays in that state until we restart the CF service. and it does appear that all pages in all apps on the affected server remain in this state until the CF restart.
5. another interestign note is that we also included a cfdump of the request scope in our test file, and when the issue occurs and we load the test file in a browser, both the request scope and the CGI scop dump empty structs.
Copy link to clipboard
Copied
3. when I say it is at random intervals, what I mean is that the issue keeps happening - sometimes on one of the two load balanced servers and sometimes on both. Sometimes it happens after a few days, and sometimes it happens after several hours.
By @Blair22505870i4ma
Ehrm, load balanced servers?
Then I expected to see something like the following in the workers.properties file:
# he list of workers
worker.list= instance1, instance2
# the load balancer
worker.loadbalancer.type=lb
# worker "instance1" will talk to Tomcat listening on machine localhost
# at port 8020 using 2.5 lb factor
worker.instance1.host=127.0.0.1
worker.instance1.port=8020
worker.instance1.lbfactor=2.5
# worker "instance2" will talk to Tomcat listening on machine www.xyz.com
# at port 8009 using 3.5 lb factor
worker.instance2.host=www.xyz.com
worker.instance2.port=8009
worker.instance2.lbfactor=3.5
Copy link to clipboard
Copied
The load balancing is all done through a seperate hardware device managed by our Client/Server department, so I dont think there needs to be any special provisions inside of coldFusion to accommodate the load balancing.
Copy link to clipboard
Copied
In view of the new information, I would now agree with Charlie's point that headers and connectors are unlikely to be the cause of the problem. The new information that sways me is the missing request scope. What if the load balancer did it?
The missing request and CGI scopes suggest to me that, when the issue occurs, the affected ColdFusion server might not be receiving traffic from the load balancer. So I would now include the load balancer in the list of suspects.
Could you
Copy link to clipboard
Copied
Then I wondered: if it is the load balancer, why does the problem go away when you restart the affected ColdFusion instance?
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Thanks for the clarifications, Blair. I do still think this all really has nothing to do with headers coming into the request, let alone the web server connector (but perhaps the other efforts here will prove fruitful, and prove me wrong). The mention of a load balancer as another part of the equation could seem compelling...
But now that you say that the request scope is also found (randomly) to be unexpectedly empty, it reinforces my sense that it's none of the above.
Instead, I'm inclined to wonder if there may well be some sort of memory error. I saw your comment about setting the heap. What was the max you set, when you did that? (There's no way to say without LOTS more info what yours SHOULD be. I just am curious to hear what you may have picked. And I'm not suggesting that this "memory error" I refer to necessarily has to do with that max heap setting.)
Can you check the colfusion-error.log (in the logs folder), and in that search for the phrase "outofmemory"? If you have any, are they at or near the time of these problems? Indeed, are they frequent (recently)? And in either case, what KIND of oom error is it? That will usually appear as another word or phrase after that word "outofmemory". It MAY say "heap", or it MAY talk about "GC overhead limit exceeded", but it may just as well refer to something UNRELATED to the heap, as in "metaspace".
Before we go any further (especially with speculation related to what I say here), let's hear first if your coldfusion-error.log does indeed have any oom errors. (And note that when you open that file, pay attention to whatever datetime stamp you find on lines at the TOP of the file. That will indicate how far back in time this log covers. It may be only hours or days, or it could be weeks or months. Note that when the log reaches a limit (of 5mb, iirc), it rotates and keeps several (again I think about 10). It can thus be useful sometimes to look beyond just the latest coldfusion-error.log.
Copy link to clipboard
Copied
Charlie and BKBK,
Thank you so much for all your thoughtful responses.
Thankfully, we have not had any issues ever since last week when we added the values back into the Java memory min/max fields in the CF Administrator.
We are going to keep a close eye on things and if we experience the issue again, then I will absoultely return to this thread and proceed with your remaining suggestions to continue to troubleshoot.
For now though I will reluctantly say that it has been solved 🙂
Thanks again!!
Copy link to clipboard
Copied
so I of course spoke too soon. The issue occurred one one of our servers again this afternoon.
This time, the request scope WAS avaiulable. THis time it was just the CGI scope which was empty.
When the issue occurred, we looked at the colfusion-error.log (and all other coldfusion log files for that matter) and searched for the phrase "outofmemory" and did not come up with any results.
there did not seem to be anything significant in any of the coldfusion logs around the time of the issue. We also checked the windows event viewer, the solr logs, fusion reactor... everywhere we looked we did nto see any significant errors or evidence of an issue.
Copy link to clipboard
Copied
Could you
Copy link to clipboard
Copied
Going back to basics, a new idea. What if ColdFusion has nothing to do with it and the problem is caused by the combination of browser and web server?
To test this hypothesis, use Curl. For example, I proceeded as follows:
<cfdump var="#cgi#">
curl http://localhost:8500/cgiTest.cfm -o C:\Users\BKBK\Desktop\cgiTest.html
Copy link to clipboard
Copied
Hi, I am Blair's teammate and I am able to reproduce the empty cgi struct and which will permanently wipe cgi struct on all future requests until restart CF but I can't find anywhere where we would be doing such a crazy thing
<cfset StructClear(cgi)>
<cfoutput>#structCount(cgi)#</cfoutput> THIS RETURNS 46 even though cgi is now empty
<cfdump var="#cgi#">
Or maybe this is just one way that cgi can be emptied?
Copy link to clipboard
Copied
Hi @gabrieldavis321 , have you read my last post? Did you do the Curl test I suggested?
Copy link to clipboard
Copied
Hi BKBK,
We haven't yet had a chance to try because we have to wait for the issue to happen again, but I am not sure that I follow the logic... we use the same web browser and IIS when checking ... so why would they behave differently when the issue occurs and when we have restarted CF to fix the issue? But we will try next time the issue arises.