We have been trying to pass an external PCI scan, and noticed some server lockups after starting a scan. We are scanning a couple hundred IP addresses, which all resolve to the same servers. The scans are actively looking for vulnerabilities on the box, and one of which is flash remoting. When we look at the apache /server-status page, it shows a ton of long running flex2gateway processes. For instance:
|W||4.07||163840||0||0.0||57.76||57.76||x.x.x.101||WebNode2.ambassador.int||POST /flex2gateway/http HTTP/1.1|
As you can see, this POST request has been running for 163840 seconds, or nearly two days. Since it seems these POST requests never complete, even though the client has long since disconnected, they simply stack up until the server's max number of child processes has been reached, effectively killing our webserver.
When I try to restart the clustered coldfusion instances one at a time, these POST requests do not die off.
If I stop both clustered CF instances, the requests complete (or get killed).
If I reload or restart apache, the requests are gone as well.
strace gives me nothing useful:
[root@WebNode1 ~]# strace -p 34025
Process 34025 attached - interrupt to quit
pstack gives a little more, but nothing that looks obvious to me:
[root@WebNode1 ~]# pstack -p 34025
Usage: pstack <process-id>
[root@WebNode1 ~]# pstack 34025
#0 0x00007fdd40444740 in __read_nocancel () from /lib64/libpthread.so.0
#1 0x00007fdd33efe2e6 in jk_tcp_socket_recvfull () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
#2 0x00007fdd33f1b68d in ajp_connection_tcp_get_message () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
#3 0x00007fdd33f1ceea in ajp_get_reply () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
#4 0x00007fdd33f20308 in ajp_service () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
#5 0x00007fdd33ef8f5d in jk_handler () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so
#6 0x00007fdd41b92cd0 in ap_run_handler ()
#7 0x00007fdd41b9658e in ap_invoke_handler ()
#8 0x00007fdd41ba1c50 in ap_process_request ()
#9 0x00007fdd41b9eac8 in ?? ()
#10 0x00007fdd41b9a7d8 in ap_run_process_connection ()
#11 0x00007fdd41ba6ad7 in ?? ()
#12 0x00007fdd41ba6dea in ?? ()
#13 0x00007fdd41ba7a6c in ap_mpm_run ()
#14 0x00007fdd41b7e9b0 in main ()
I dont know what that tells us exactly, but I'm leaning toward the hangup between apache and tomcat.
Any suggestions on where how to troubleshoot this issue?
I removed clustering by editing the uriworkermap.properties file and pointing /flex2gateway and /flex2gateway/* to a single instance, and then ran the PCI scan again. It still seems to hang. I'm surprised there no other complaints about this out there on the interwebs. I cant be the only one.
OK, I did a little more testing from a linux CLI using curl, and I find that if I post to /flex2gateway/<any string> it will hang indefinitely. A normal get request results in a 404, but a post will hang it indefinitely. Whats more, posting to just /flex2gateway/ seems to perform normally (some kind of binary data connection). Its only if I put something in the path after /flex2gateway/ that it hangs indefinitely. It performs the same if I hit one instance specifically, as opposed to through the cluster, so that eliminates apache as the problem. I also notice a hang when posting to /flex-internal/ and /flex-internal/<some string>
Any clue as to why this might act this way?
On a test server, I have removed the wildcard from the uriworkermap.properties file, so it now only matches "/flex2gateway" and "/flex2gateway/". Unfortunately I'm still seeing the occasional hung apache worker.
Anyone have any leads on this issue? I don't mind doing the research, I'v just exhausted the limits of my Google Fu.
WWWWWWW_W_W_W__W__W__WW_W_...................................... ................................................................ ................................................................ ................................................................
_" Waiting for Connection, "
S" Starting up, "
R" Reading Request,
W" Sending Reply, "
K" Keepalive (read), "
D" DNS Lookup,
C" Closing connection, "
L" Logging, "
G" Gracefully finishing,
I" Idle cleanup of worker, "
." Open slot with no current process
|0-0||8727||0/12/12||W||0.03||4572||0||0.0||0.05||0.05||10.10.2.201||qc.company.int||POST /flex2gateway HTTP/1.1|
|1-0||8728||0/11/11||W||0.03||4358||0||0.0||0.18||0.18||10.10.2.201||qc.company.int||POST /flex2gateway HTTP/1.1|
|2-0||8729||0/38/38||W||0.04||3910||0||0.0||1.11||1.11||10.10.2.201||qc.company.int||POST /flex2gateway HTTP/1.1|
|3-0||8730||0/27/27||W||0.03||4064||0||0.0||0.79||0.79||10.10.2.201||qc.company.int||POST /flex2gateway HTTP/1.1|
|4-0||8731||0/16/16||W||0.03||4354||0||0.0||0.12||0.12||10.10.2.201||qc.company.int||POST /flex2gateway HTTP/1.1|
|5-0||8732||0/7/7||W||0.02||4564||0||0.0||0.02||0.02||10.10.2.201||qc.company.int||POST /flex2gateway HTTP/1.1|
|6-0||8733||0/8/8||W||0.02||4673||0||0.0||0.01||0.01||10.10.2.201||qc.company.int||POST /flex2gateway HTTP/1.1|
|7-0||8734||0/386/386||_||0.37||4||0||0.0||6.49||6.49||10.10.2.212||www.company.qc||GET /marketingpages/images/login_over.jpg HTTP/1.1|
|8-0||9422||0/10/10||W||0.02||4564||0||0.0||0.04||0.04||10.10.2.201||qc.company.int||POST /flex2gateway HTTP/1.1|
|9-0||10112||0/393/393||_||0.37||6||0||0.0||14.59||14.59||10.10.2.212||www.company.qc||GET /marketingpages/images/box_onesource.jpg HTTP/1.1|
|10-0||10468||0/321/321||W||0.32||846||0||0.0||4.42||4.42||10.10.2.212||qc.company.int||POST /flex2gateway HTTP/1.1|
|11-0||10470||0/398/398||_||0.38||6||0||0.0||12.80||12.80||10.10.2.212||www.company.qc||GET /marketingpages/images/home_eco.jpg HTTP/1.1|
|12-0||10471||0/340/340||W||0.32||837||0||0.0||4.99||4.99||10.10.2.212||qc.company.int||POST /flex2gateway/ HTTP/1.1|
|13-0||10544||0/404/404||_||0.34||6||0||0.0||5.21||5.21||10.10.2.212||www.company.qc||GET /marketingpages/images/box_top.jpg HTTP/1.1|
|14-0||10592||0/353/353||_||0.40||6||12||0.0||14.10||14.10||10.10.2.212||www.company.qc||GET /?login HTTP/1.1|
|15-0||10648||0/296/296||W||0.31||800||0||0.0||3.82||3.82||10.10.2.212||qc.company.int||POST /flex2gateway/ HTTP/1.1|
|16-0||12382||0/339/339||_||0.33||6||0||0.0||2.85||2.85||10.10.2.212||www.company.qc||GET /marketingpages/images/logo_sourceone.jpg HTTP/1.1|
|17-0||12387||0/336/336||_||0.34||6||0||0.0||5.06||5.06||10.10.2.212||www.company.qc||GET /marketingpages/images/logo_onesource.jpg HTTP/1.1|
|18-0||12388||0/265/265||W||0.25||839||0||0.0||2.87||2.87||10.10.2.212||qc.company.int||POST /flex2gateway/ HTTP/1.1|
|19-0||12389||0/323/323||_||0.31||0||0||0.0||4.82||4.82||10.10.2.212||www.company.qc||GET /marketingpages/lib/dimming.js HTTP/1.1|
|20-0||12390||0/336/336||_||0.31||4||0||0.0||5.24||5.24||10.10.2.212||www.company.qc||GET /marketingpages/lib/superfish.js HTTP/1.1|
|21-0||12391||0/289/289||W||0.27||805||0||0.0||2.49||2.49||10.10.2.212||qc.company.int||POST /flex2gateway/ HTTP/1.1|
|22-0||12392||0/281/281||W||0.27||831||0||0.0||3.17||3.17||10.10.2.212||qc.company.int||POST /flex2gateway HTTP/1.1|
|23-0||14750||0/41/41||_||0.04||6||0||0.0||0.92||0.92||10.10.2.212||www.company.qc||GET /marketingpages/images/close.jpg HTTP/1.1|
|24-0||14751||0/43/43||W||0.04||0||0||0.0||1.21||1.21||10.10.2.36||qc.company.int||GET /server-status HTTP/1.1|
|25-0||14752||0/40/40||_||0.04||6||0||0.0||0.96||0.96||10.10.2.212||www.company.qc||GET /marketingpages/images/box_sourceone.jpg HTTP/1.1|
Copy link to clipboard
Make sure you have the following in one of your config files:
# enable Flex Gateway
JkMount /*.cfm ajp13
JkMount /*.cfc ajp13
JkMount /*.do ajp13
JkMount /*.jsp ajp13
JkMount /*.cfchart ajp13
JkMount /*.cfres ajp13
JkMount /*.cfm/* ajp13
JkMount /*.cfml/* ajp13
If you add this to the end of the mod_jk.conf file, just be careful when updating your connector in the future, because it may remove the lines. These commands are required to get the flex2gateway working in CF10. Without these lines, we've seen the exact same behavior you're describing.
Hope this helps!
Thanks for the response. Where exactly did you need to add this block of code? I tried adding it to the end of the mod_jk.conf file, as well as adding it to the default virtual host block in the httpd.conf files. Neither seems to have helped when testing. Thanks.
We have it in our mod_jk.conf file, but be careful when updating the connector because it may remove the code.
Make sure you've restarted Apache/ColdFusion after adding the lines as well.
You might want to return your uriworkermap.properties back to it's original version.
Here's the thread where I originally found the entries that needed to be added:
Maybe you can find more info from someone in that post.
Thanks Dan, but I think we're talking about different issues. We are well past the 404 problem. This was solved by an alternate fix: Adding the following code to the uriworkermap.properties file:
/flex2gateway/* = CFCluster
/flex2gateway = CFCluster
My problem is not an issue getting flex2gateway working - it works just fine. The problem we see come up primarily during a PCI scan, when the scan attempts to post data to "http://10.x.x.x/flex2gateway/http" and the worker hangs indefinitely. I can recreate the issue using curl like so: curl --data "param1=value1¶m2=value2" http://10.x.x.x/flex2gateway/http
I have no such issue if I post to http://10.x.x.x/flex2gateway/ without the /http path.
I have gotten around this problem by denying access to the /flex2gateway/http and /flex2gateway/httpsecure directories in the apache config, since these path are not used, nor are they even found.
Deny from all
Deny from all
I think the problem is similar to what you see when Flex isn't configured properly. It would appear Apache is handing off the request and then waiting for ColdFusion to respond, but it doesn't know how to handle the resource. I wonder if there's something in the web.xml that needs to be updated as well so that CF knows how to handle the /flex2gateway/http and /flex2gateway/httpsecure URIs.