Highlighted

Apache POST flex2gateway never closes or times out, reaches max child processes

Explorer ,
Nov 06, 2014

Copy link to clipboard

Copied

We have been trying to pass an external PCI scan, and noticed some server lockups after starting a scan.  We are scanning a couple hundred IP addresses, which all resolve to the same servers.  The scans are actively looking for vulnerabilities on the box, and one of which is flash remoting.  When we look at the apache /server-status page, it shows a ton of long running flex2gateway processes.  For instance:

22-44466

0/3817/3817

W4.0716384000.057.7657.76x.x.x.101WebNode2.ambassador.intPOST /flex2gateway/http HTTP/1.1

As you can see, this POST request has been running for 163840 seconds, or nearly two days.  Since it seems these POST requests never complete, even though the client has long since disconnected, they simply stack up until the server's max number of child processes has been reached, effectively killing our webserver.

When I try to restart the clustered coldfusion instances one at a time, these POST requests do not die off.

If I stop both clustered CF instances, the requests complete (or get killed).

If I reload or restart apache, the requests are gone as well.

strace gives me nothing useful:

[root@WebNode1 ~]# strace -p 34025

Process 34025 attached - interrupt to quit

read(185,

pstack gives a little more, but nothing that looks obvious to me:

[root@WebNode1 ~]# pstack -p 34025     

Usage: pstack <process-id>

[root@WebNode1 ~]# pstack 34025  

#0  0x00007fdd40444740 in __read_nocancel () from /lib64/libpthread.so.0

#1  0x00007fdd33efe2e6 in jk_tcp_socket_recvfull () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#2  0x00007fdd33f1b68d in ajp_connection_tcp_get_message () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#3  0x00007fdd33f1ceea in ajp_get_reply () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#4  0x00007fdd33f20308 in ajp_service () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#5  0x00007fdd33ef8f5d in jk_handler () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#6  0x00007fdd41b92cd0 in ap_run_handler ()

#7  0x00007fdd41b9658e in ap_invoke_handler ()

#8  0x00007fdd41ba1c50 in ap_process_request ()

#9  0x00007fdd41b9eac8 in ?? ()

#10 0x00007fdd41b9a7d8 in ap_run_process_connection ()

#11 0x00007fdd41ba6ad7 in ?? ()

#12 0x00007fdd41ba6dea in ?? ()

#13 0x00007fdd41ba7a6c in ap_mpm_run ()

#14 0x00007fdd41b7e9b0 in main ()

I dont know what that tells us exactly, but I'm leaning toward the hangup between apache and tomcat. 

Any suggestions on where how to troubleshoot this issue?

Views

603

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Apache POST flex2gateway never closes or times out, reaches max child processes

Explorer ,
Nov 06, 2014

Copy link to clipboard

Copied

We have been trying to pass an external PCI scan, and noticed some server lockups after starting a scan.  We are scanning a couple hundred IP addresses, which all resolve to the same servers.  The scans are actively looking for vulnerabilities on the box, and one of which is flash remoting.  When we look at the apache /server-status page, it shows a ton of long running flex2gateway processes.  For instance:

22-44466

0/3817/3817

W4.0716384000.057.7657.76x.x.x.101WebNode2.ambassador.intPOST /flex2gateway/http HTTP/1.1

As you can see, this POST request has been running for 163840 seconds, or nearly two days.  Since it seems these POST requests never complete, even though the client has long since disconnected, they simply stack up until the server's max number of child processes has been reached, effectively killing our webserver.

When I try to restart the clustered coldfusion instances one at a time, these POST requests do not die off.

If I stop both clustered CF instances, the requests complete (or get killed).

If I reload or restart apache, the requests are gone as well.

strace gives me nothing useful:

[root@WebNode1 ~]# strace -p 34025

Process 34025 attached - interrupt to quit

read(185,

pstack gives a little more, but nothing that looks obvious to me:

[root@WebNode1 ~]# pstack -p 34025     

Usage: pstack <process-id>

[root@WebNode1 ~]# pstack 34025  

#0  0x00007fdd40444740 in __read_nocancel () from /lib64/libpthread.so.0

#1  0x00007fdd33efe2e6 in jk_tcp_socket_recvfull () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#2  0x00007fdd33f1b68d in ajp_connection_tcp_get_message () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#3  0x00007fdd33f1ceea in ajp_get_reply () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#4  0x00007fdd33f20308 in ajp_service () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#5  0x00007fdd33ef8f5d in jk_handler () from /opt/coldfusion10/config/wsconfig/1/mod_jk.so

#6  0x00007fdd41b92cd0 in ap_run_handler ()

#7  0x00007fdd41b9658e in ap_invoke_handler ()

#8  0x00007fdd41ba1c50 in ap_process_request ()

#9  0x00007fdd41b9eac8 in ?? ()

#10 0x00007fdd41b9a7d8 in ap_run_process_connection ()

#11 0x00007fdd41ba6ad7 in ?? ()

#12 0x00007fdd41ba6dea in ?? ()

#13 0x00007fdd41ba7a6c in ap_mpm_run ()

#14 0x00007fdd41b7e9b0 in main ()

I dont know what that tells us exactly, but I'm leaning toward the hangup between apache and tomcat. 

Any suggestions on where how to troubleshoot this issue?

Views

604

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Nov 06, 2014 0
Explorer ,
Nov 06, 2014

Copy link to clipboard

Copied

I removed clustering by editing the uriworkermap.properties file and pointing /flex2gateway and /flex2gateway/* to a single instance, and then ran the PCI scan again.  It still seems to hang.  I'm surprised there no other complaints about this out there on the interwebs.  I cant be the only one.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 06, 2014 0
Explorer ,
Nov 07, 2014

Copy link to clipboard

Copied

OK, I did a little more testing from a linux CLI using curl, and I find that if I post to /flex2gateway/<any string> it will hang indefinitely.  A normal get request results in a 404, but a post will hang it indefinitely.  Whats more, posting to just /flex2gateway/ seems to perform normally (some kind of binary data connection).  Its only if I put something in the path after /flex2gateway/ that it hangs indefinitely.  It performs the same if I hit one instance specifically, as opposed to through the cluster, so that eliminates apache as the problem.  I also notice a hang when posting to /flex-internal/ and /flex-internal/<some string>

Any clue as to why this might act this way?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 07, 2014 0
Explorer ,
Nov 10, 2014

Copy link to clipboard

Copied

On a test server, I have removed the wildcard from the uriworkermap.properties file, so it now only matches "/flex2gateway" and "/flex2gateway/".  Unfortunately I'm still seeing the occasional hung apache worker. 


Anyone have any leads on this issue?  I don't mind doing the research, I'v just exhausted the limits of my Google Fu.


Apache Server Status for 10.10.10.205

Server Version: Apache/2.2.15 (Unix) DAV/2 PHP/5.3.3 mod_ssl/2.2.15 OpenSSL/1.0.1e-fips mod_wsgi/3.2 Python/2.6.6 mod_jk/1.2.32 mod_perl/2.0.4 Perl/v5.10.1
Server Built: Oct 16 2014 14:48:21

Current Time: Monday, 10-Nov-2014 16:49:22 EST
Restart Time: Monday, 10-Nov-2014 15:25:16 EST
Parent Server Generation: 0
Server uptime: 1 hour 24 minutes 6 seconds
Total accesses: 5313 - Total Traffic: 98.4 MB
CPU Usage: u3.97 s1.26 cu0 cs0 - .104% CPU load
1.05 requests/sec - 20.0 kB/second - 19.0 kB/request
15 requests currently being processed, 11 idle workers
WWWWWWW_W_W_W__W__W__WW_W_...................................... ................................................................ ................................................................ ................................................................ 

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

SrvPIDAccMCPUSSReqConnChildSlotClientVHostRequest
0-087270/12/12W0.03457200.00.050.0510.10.2.201qc.company.intPOST /flex2gateway HTTP/1.1
1-087280/11/11W0.03435800.00.180.1810.10.2.201qc.company.intPOST /flex2gateway HTTP/1.1
2-087290/38/38W0.04391000.01.111.1110.10.2.201qc.company.intPOST /flex2gateway HTTP/1.1
3-087300/27/27W0.03406400.00.790.7910.10.2.201qc.company.intPOST /flex2gateway HTTP/1.1
4-087310/16/16W0.03435400.00.120.1210.10.2.201qc.company.intPOST /flex2gateway HTTP/1.1
5-087320/7/7W0.02456400.00.020.0210.10.2.201qc.company.intPOST /flex2gateway HTTP/1.1
6-087330/8/8W0.02467300.00.010.0110.10.2.201qc.company.intPOST /flex2gateway HTTP/1.1
7-087340/386/386_0.37400.06.496.4910.10.2.212www.company.qcGET /marketingpages/images/login_over.jpg HTTP/1.1
8-094220/10/10W0.02456400.00.040.0410.10.2.201qc.company.intPOST /flex2gateway HTTP/1.1
9-0101120/393/393_0.37600.014.5914.5910.10.2.212www.company.qcGET /marketingpages/images/box_onesource.jpg HTTP/1.1
10-0104680/321/321W0.3284600.04.424.4210.10.2.212qc.company.intPOST /flex2gateway HTTP/1.1
11-0104700/398/398_0.38600.012.8012.8010.10.2.212www.company.qcGET /marketingpages/images/home_eco.jpg HTTP/1.1
12-0104710/340/340W0.3283700.04.994.9910.10.2.212qc.company.intPOST /flex2gateway/ HTTP/1.1
13-0105440/404/404_0.34600.05.215.2110.10.2.212www.company.qcGET /marketingpages/images/box_top.jpg HTTP/1.1
14-0105920/353/353_0.406120.014.1014.1010.10.2.212www.company.qcGET /?login HTTP/1.1
15-0106480/296/296W0.3180000.03.823.8210.10.2.212qc.company.intPOST /flex2gateway/ HTTP/1.1
16-0123820/339/339_0.33600.02.852.8510.10.2.212www.company.qcGET /marketingpages/images/logo_sourceone.jpg HTTP/1.1
17-0123870/336/336_0.34600.05.065.0610.10.2.212www.company.qcGET /marketingpages/images/logo_onesource.jpg HTTP/1.1
18-0123880/265/265W0.2583900.02.872.8710.10.2.212qc.company.intPOST /flex2gateway/ HTTP/1.1
19-0123890/323/323_0.31000.04.824.8210.10.2.212www.company.qcGET /marketingpages/lib/dimming.js HTTP/1.1
20-0123900/336/336_0.31400.05.245.2410.10.2.212www.company.qcGET /marketingpages/lib/superfish.js HTTP/1.1
21-0123910/289/289W0.2780500.02.492.4910.10.2.212qc.company.intPOST /flex2gateway/ HTTP/1.1
22-0123920/281/281W0.2783100.03.173.1710.10.2.212qc.company.intPOST /flex2gateway HTTP/1.1
23-0147500/41/41_0.04600.00.920.9210.10.2.212www.company.qcGET /marketingpages/images/close.jpg HTTP/1.1
24-0147510/43/43W0.04000.01.211.2110.10.2.36qc.company.intGET /server-status HTTP/1.1
25-0147520/40/40_0.04600.00.960.9610.10.2.212www.company.qcGET /marketingpages/images/box_sourceone.jpg HTTP/1.1

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 10, 2014 0
Community Beginner ,
Nov 30, 2014

Copy link to clipboard

Copied

Make sure you have the following in one of your config files:

# enable Flex Gateway

<IfModule jk_module>

    JkMount /*.cfm ajp13

    JkMount /*.cfc ajp13

    JkMount /*.do ajp13

    JkMount /*.jsp ajp13

    JkMount /*.cfchart ajp13

    JkMount /*.cfres ajp13

    JkMount /*.cfm/* ajp13

    JkMount /*.cfml/* ajp13

    JkMountCopy all

</IfModule>

If you add this to the end of the mod_jk.conf file, just be careful when updating your connector in the future, because it may remove the lines. These commands are required to get the flex2gateway working in CF10. Without these lines, we've seen the exact same behavior you're describing.

Hope this helps!

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 30, 2014 0
Explorer ,
Dec 01, 2014

Copy link to clipboard

Copied

Thanks for the response.  Where exactly did you need to add this block of code?  I tried adding it to the end of the mod_jk.conf file, as well as adding it to the default virtual host block in the httpd.conf files.  Neither seems to have helped when testing.  Thanks.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Dec 01, 2014 0
Community Beginner ,
Dec 01, 2014

Copy link to clipboard

Copied

We have it in our mod_jk.conf file, but be careful when updating the connector because it may remove the code.

Make sure you've restarted Apache/ColdFusion after adding the lines as well.

You might want to return your uriworkermap.properties back to it's original version.

Here's the thread where I originally found the entries that needed to be added:

Re: Coldfusion 10 + Apache + Flex2gateway + Debian/Linux

Maybe you can find more info from someone in that post.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Dec 01, 2014 0
Explorer ,
Dec 01, 2014

Copy link to clipboard

Copied

Thanks Dan, but I think we're talking about different issues.  We are well past the 404 problem.  This was solved by an alternate fix:  Adding the following code to the uriworkermap.properties file:

/flex2gateway/* = CFCluster

/flex2gateway = CFCluster

My problem is not an issue getting flex2gateway working - it works just fine.  The problem we see come up primarily during a PCI scan, when the scan attempts to post data to "http://10.x.x.x/flex2gateway/http" and the worker hangs indefinitely.  I can recreate the issue using curl like so:  curl --data "param1=value1&param2=value2" http://10.x.x.x/flex2gateway/http

I have no such issue if I post to http://10.x.x.x/flex2gateway/ without the /http path.

I have gotten around this problem by denying access to the /flex2gateway/http and /flex2gateway/httpsecure directories in the apache config, since these path are not used, nor are they even found.

<Location /flex2gateway/http>

    Order deny,allow

    Deny from all

</Location>

<Location /flex2gateway/httpsecure>

    Order deny,allow

    Deny from all

</Location>

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Dec 01, 2014 0
Community Beginner ,
Dec 01, 2014

Copy link to clipboard

Copied

Gotcha.

I think the problem is similar to what you see when Flex isn't configured properly. It would appear Apache is handing off the request and then waiting for ColdFusion to respond, but it doesn't know how to handle the resource. I wonder if there's something in the web.xml that needs to be updated as well so that CF knows how to handle the /flex2gateway/http and /flex2gateway/httpsecure URIs.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Dec 01, 2014 0