Here's the deal, we have a CF web service which generates PDF
files. You pass it HTML and the path to a file server and it writes
a PDF there. Pretty simple. We have a Java app which generates
appraisals on the fly and uses the CF web service to make a PDF.
The problem: Once or twice a day a request would just hang.
CPU usage would stay low, but the thread would just sit there. This
was on CF 7.0.2 Standard which means cfdocument was single threaded
and all other cfdocuments on the server would get in line and wait.
We now have CF 7.0.2 Enterprise which multi-threads
cfdocument, but it hasn't solved the initial problem.
These hung threads will run for hours EVEN THOUH the timeout
is 100 seconds. Furthermore, I CANNOT kill the thread with
SeeFusion. I have to restart CF.
So. I started digging. I would observe stack traces for the
hung threads. Every time the thread hung it would be fetching
images in the HTML. I knew this because the top of the thread stack
always looks like this:
"jrpp-80" waiting for monitor entry
- waiting on <147> a
com.icesoft.util.trackers.RenderDoneTracker)
at
com.icesoft.util.trackers.RenderDoneTracker.propertyChange(RenderDoneTracker.java:104)
at ice.storm.StormBase.add(OEAB)
at ice.storm.StormBase.sendViewportMessage(OEAB)
at ice.storm.StormBase.stopViewportLoading(OEAB)
at ice.storm.StormBase.stopLoading(OEAB)
at
coldfusion.document.DocumentProcessor.processContent(DocumentProcessor.java:149)
at
coldfusion.document.DocumentProcessor.ProcessContent(DocumentProcessor.java:59)
at
coldfusion.tagext.lang.DocumentTag.processContent(DocumentTag.java:1235)
at
coldfusion.tagext.lang.DocumentTag.doAfterBody(DocumentTag.java:1190)
at
cffilemanagement2ecfc930489299$funcCREATEPDF.runFunction(D:\Inetpub2\CustomTags\CFC\machII\tempest\filemanagement.cfc:122)
and there will be another thread every time that looks like
this:
"Image Fetcher 2" runnable
- locked <142> (a java.io.BufferedInputStream)
- locked <144> (a
sun.net.
www.protocol.http.HttpURLConnection)
- locked <147> (a
com.icesoft.util.trackers.RenderDoneTracker)
at java.net.SocketInputStream.socketRead0(Native Method)
at
java.net.SocketInputStream.read(SocketInputStream.java:129)
at
java.io.BufferedInputStream.fill(BufferedInputStream.java:183)
at
java.io.BufferedInputStream.read1(BufferedInputStream.java:222)
at
java.io.BufferedInputStream.read(BufferedInputStream.java:277)
at
sun.net.
www.http.HttpClient.parseHTTPHeader(HttpClient.java:760)
at
sun.net.
www.http.HttpClient.parseHTTP(HttpClient.java:711)
at
sun.net.
www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:635)
at java.net.URLConnection.getContent(URLConnection.java:585)
at java.net.URL.getContent(URL.java:927)
at
coldfusion.document.DocumentImageRetriever.processImageSrc(DocumentImageRetriever.java:117)
at
coldfusion.document.DocumentImageRetriever.fetch(DocumentImageRetriever.java:69)
Notice how the first thread is waiting on thread 147 which is
locked by the Imagefetcher thread.
Ok, so images are my problem. But which ones?
I started by loading the HTML into IE while running MS
Fiddler to look for 404's or other errors but there were none. My
cfdocument problems were sporadic and random so it was something
which would work most of the time but not every time.
I waited for the problem to happen again and used TCPView.exe
from SYSInternals to view all my open TCP connections. Sure enough
I had an open TCP connection every time from jrun.exe to the same
remote server. The HTML being generated referenced an image on this
remote server.
I also found at this point, that if I used TCPView to kill
the TCP connection, the thread would "un-hang"-- proving that
cfdocument was waiting on the image to return.
Now I fired up Ethereal, which is a packet sniffer and had it
log all TCP traffic to and from that server for two days until a
request hung again. I waited 15 minutes, then killed the TCP
connection (which coaxed the thread into finishing) and then looked
at my network trace. Sure enough, at the same time the cfdocument
started, the following TCP conversation happened between JRUN and
the remote server:
JRUN: SYN (read: you there?)
Remote: SYN ACK (read: yeah, why?)
JRUN: ACK (read: just checking.)
JRUN: HTTP GET host/path/image_name (read: hey, can I have
this image)
Remote: ACK (read: sure, coming right up)
That was the last TCP frame in the stream until I killed it
15 minutes later. The remote server never responded, and the
ImageFetcher function just sat there by the phone at
"java.net.SocketInputStream.socketRead0" till kingdom come.
Now, I have no control over the remote server, so don't even
suggest I try to fix their problems. Anyone outside my firewall is
outside of my control.
I want to know:
1) Why the Cold Fusion Timeout isn’t observed
2) Why I can't kill the thread manually with SeeFusion
3) Why the HECK cfdocument has no internal timeout for HTTP
requests. (I let one run for 24 hours before and it just sat there)
4) Is this a known issue?
5) Is there a bug request in for it?
6) When will it be fixed?
7) Where the heck are Spain's plains located, and why
won’t the rain leave it alone?
If you have any information, think you have seen this
problem, would like more information, or are just trolling the
forums late at night looking for conversation, please weigh in on
this.
Thanks!
~Brad