Copy link to clipboard
Copied
Anyone else have an issue with 2021 and long running scheduled task running multiple times?
I've narrowed it down to only occurring using a JDK higher than 11.0.12.
I've been replicating this by scheduling a task to a page with the following content:
<cfset sleep(360000)>
<cfmail to="youremail@email.edu" from="testtask@email.edu" subject="test task" type="html">
<cfdump var='#timeFormat(now(), "HH:mm:ss")#'/>
</cfmail>
I did a lot of troubleshooting and finally narrowed it down and have a workaround.
We offload SSL termination to a F5 load balancer. The issue occurs when the scheduled task url hits the F5 first. So going F5 -> IIS -> CF. Still not sure why this only happens with newer Java though.
Since we have a lot of scheduled tasks and I didn't want to update all of them, I hardcoded the IP of the server running CF in its hosts file. So all http requests initiated on the CF server itself will go to itself
...Copy link to clipboard
Copied
We just upgraded to 2023 and this is still happening.
Copy link to clipboard
Copied
Well, I have not succeeded in replicating that test that the OP offered. As such, whatever it is that you and ther are experiencing (even if you may find others confirming it), it would seem there must be something environmental affecting it.
So let's talk about a few things:
Some more background: since CF 10 the cf scheduled tasks run under the covers based on an open-source library called quartz (with config files in the cfusion/quartz folder). And for some number of releases, there has indeed been a problem were the default timeout for tasks (in QUARTZ) is 60, which means the task gets kicked off (and runs, wherever the URL says to run--and it runs as long as it needs to run), but the quartz task itself "fails"--as is logged in CF's scheduler.log (if you have enabled logging of scheduled tasks, in the CF Admin logging settings page).
I am working up a blog post on how to enable additional logging from the quartz perspective. That too can help in better assessing if it's true (or not) that this quartz scheduler in CF is responsible for this seeming double-execution of tasks that you guys are observing.
There may well be another known reason for that seemnig to happen, and maybe someone else will chime in with more thoughts. I wanted to offer the above in the meantime.
Copy link to clipboard
Copied
Hi Charlie, I'm actually the OP as well. My username changed for some reason since my initial post.
We've been stuck on CF21 with java 11.0.2 this entire time. Any newer version of java triggers this behavior, no matter what jdk varient (oracle, amazon, adoptium). Now that CF21 is getting close to eol, I've upgraded our test server to CF23 and got the same behavior.
I've since modified the sample code:
<cfset starttime = '#timeFormat(now(), "HH:mm:ss")#'>
<cfset sleep(360000)>
Start time : <cfoutput>#starttime#</cfoutput> <br>
End time: <cfdump var='#timeFormat(now(), "HH:mm:ss")#'/>
<cfmail to="testuser@email.com" from="testtask@email.com" subject="test task" type="html">
Start time : <cfoutput>#starttime#</cfoutput> <br>
End time: <cfdump var='#timeFormat(now(), "HH:mm:ss")#'/> <br>
<cfdump var="#cgi.CERT_SERVER_ISSUER#">
</cfmail>
Running this page via task schedule once, produces 3 emails. each one has a start time of 3 minutes after the previous and the end time is as expected with 6 minutes after the start time. The coldfusion schedule log shows the task kicking off once. I did try changing the default task retry to 0 instead 3 with no luck. IIS access logs has the page getting hit only once also.
I was going to try setting up a linux environment to see if it happens there
Copy link to clipboard
Copied
I was looking at the wrong IIS log. It does show up multiple times in the IIS access logs.
Copy link to clipboard
Copied
I'd recommend you next add to your cfmail code there a dump of the entire cgi scope, to see more about the source of the request. Then add also a dump of the server scope, to see more about the cf instance running the request. Most important, compare those dumps between each of the different emails you get. We may find a key difference that explains this.
Finally, if the pressure is on and you "just need this solved", I'm confident I could resolve this in a screenshare session together. We may not even need an hour. You won't pay for time you don't find valuable. More on my rates, approach, satisfaction guarantee, online calendar and more at carehart.org/consulting.
Or I and others here can keep pressing on here. This can be diagnosed, understood, and resolved, I'm sure of it . You need not limp along on an old cf and Java version.
Copy link to clipboard
Copied
We have had similar issues in the past with IIS.
When the application pool recycles, it will cause the running task to rerun - I am not exactly sure why.
We used to have a long running task over night and the default IIS app pool recycle is 1740 (29 hours) - eventually it ends up coming back round to the time of night we ran the schedule and we found it ran twice.
CF showed it running once, IIS twice.
We changed our app pool settings to recycle at a specific time and the issue went away as the time was outside our schedule.
I would check the app pool recycling time, or even logs to see if its crashing causing the app pool to recycle.
Copy link to clipboard
Copied
@pichardov and @w49369461 , please share your:
Copy link to clipboard
Copied
Windows Server 2019
Coldfusion 21 Update 19 only if Java version is higher than 11.0.12.
Coldfusion 23 Update 15.
Copy link to clipboard
Copied
Thanks for the information about your Operating System. There is a question to answer about the CFM page called by the scheduled task. What is the following line all about?
<cfset sleep(360000)>
It doesn't really make sense to me. What happens when you remove the sleep line? In other words, what happens when the task is given enough time to run?
You are after all in the realm of scheduler threads which execute time-based tasks. So you could just configure the scheduled task to run every six minutes.
Please share the current settings of the scheduled task.
Copy link to clipboard
Copied
The sleep line is to simulate a scheduled task that runs for longer than 5 minutes.
Copy link to clipboard
Copied
Did you check the IIS application pool as per my last comment?
Copy link to clipboard
Copied
The sleep line is to simulate a scheduled task that runs for longer than 5 minutes.
By @pichardov
I guessed as much. But two questions come to mind:
That aside, question 1 leads me to an idea. What if you enclose, within a named lock, the code in the CFM page:
<!--- CFM page run by scheduled task --->
<!--- The lock. Timeout is 360 seconds. --->
<cflock timeout="360" name="MySchedulerLock" type="exclusive">
<!--- Here goes the long-running code --->
</cflock>
Does the issue still occur?
I arrived at the idea as follows. Let's assume that what you have discovered is a bug. Then the bug is probably caused not by a change in Java but by a change in ColdFusion. More likely by a change in how ColdFusion works with different Java versions. I say this for three reasons:
Internally ColdFusion uses a scheduler built on top of Java's ScheduledExecutorService. The class that implements the service is ScheduledThreadPoolExecutor. The ScheduledThreadPoolExecutor does not monitor or kill long-running tasks by default. It implements two policies to cope with delay. They are: scheduleWithFixedDelay (wait for the task to finish) and scheduleAtFixedRate (allow other tasks to launch anyway, causing overlap).
I don't know the actual policy that ColdFusion uses (fixed delay or fixed rate), as it is not documented. But, because of the bug you've found, I am guessing scheduleAtFixedRate.
With scheduleAtFixedRate, blocked or stalled Input/Output (such as from cfmail or cfquery connections) could allow other scheduler threads to run anyway. In other words, if ColdFusion's ScheduledThreadPoolExecutor implements scheduleAtFixedRate, and the task takes longer than the period, you’ll get overlapping executions.
But you may say, "The issue still occurs when there is just one scheduled task registered in ColdFusion. So there is no question of scheduled tasks overlapping."
The answer to that is:
Copy link to clipboard
Copied
I have looked into the issue some more, and continue to do so. As far as Java changes go, I have been unable to find any explicit acknowledgement from Oracle or OpenJDK saying something like, “We changed ScheduledThreadPoolExecutor behavior in 11.0.13+". Therefore, the behaviour you observed (ColdFusion scheduled tasks overlapping under JDK 11.0.13 or newer) seems to be an undocumented runtime behavior change rather than a documented API change. The research continues.
Copy link to clipboard
Copied
I did a lot of troubleshooting and finally narrowed it down and have a workaround.
We offload SSL termination to a F5 load balancer. The issue occurs when the scheduled task url hits the F5 first. So going F5 -> IIS -> CF. Still not sure why this only happens with newer Java though.
Since we have a lot of scheduled tasks and I didn't want to update all of them, I hardcoded the IP of the server running CF in its hosts file. So all http requests initiated on the CF server itself will go to itself vs going over to the F5 first.
I did have to create a self signed IIS cert with the same name as the cert on the F5 and trust it into the java keystore. Initially I tried a self signed cert with the hostname of the CF server but for whatever reason, that fails the ssl check even when trusted into the java keystore.
Copy link to clipboard
Copied
Thanks for the update. Interesting problem and solution. It also explains why this is not a problem commonly observed. In fact, it may still be possible someone else will come here experiencing the problem, where yours won't be their reason and solution. But at least it's solved for you, as the OP (though you acknowledged having used a different handle then).
As for @paule12345's report (which arrived literally the same minute as your "solution" here), that really seems a different issue, as I'll discuss in reply to that.
Copy link to clipboard
Copied
Thanks, @pichardov , for sharing your workaround.
That's a weird one. But it works, that's the main thing.
Copy link to clipboard
Copied
I'm wondering if your issue is related to TLS support in the different Java versions and communication with the F5. TLS 1.0 and 1.1 are disabled by default beginning with JDK 11.0.11:
https://www.petefreitag.com/blog/tlsv1-tlsv1-1-disabled-java/
Copy link to clipboard
Copied
Great point, Paul. In fact, I'd not considered that since the original post mentioned the problem as going past 11.0.2. But I see now that @pichardov
had in fact mentioned last week that it was "only if Java version is higher than 11.0.12". So you're very likely spot on.
But that's not your own issue below, I assume. Is that right?
Copy link to clipboard
Copied
The original was a typo. Our CF 2021 server is stuck on 11.0.12
Copy link to clipboard
Copied
Wow, that's too bad. One of us might have connected the dot far sooner...though the interaction with the f5 may not have been something anyone would have anticipated.
Anyway, have you considered the workarounds in Pete's post that Paul pointed to?
Note also he was writing in advance of the change. See my comments from a couple weeks later, with a link to my own Apr 2021 blog post on the problem and solution (documented by Oracle when that jvm update came out).
FWIW, I continued to warn people of that change with the next several subsequent jvm updates, as it was a doozy for some--and tragically impacting your/CF's ability to talk to things you might have zero control over.
Copy link to clipboard
Copied
@pichardov , The connection you've found between the issue and SSL/certificates may explain something else. It probably explains why the issue happens only when ColdFusion 2021's Java version is 11.0.13 or higher.
The following scenarion is plausible:
From Java 11.0.13 onwards, changes in SSL/TLS/Cerificates and SAN/Hostname verification might have become stricter than before. The changes in security in Java 11.0.13 include:
- Program fails when using JDK addressed by UNC path and using Security Manager.
- Update keytool to create AKID from the SKID of the issuing certificate as specified by RFC 5280.
- Remove IdenTrust certificate.
- NullPointerException in JKS keystore.
- Allow initialization of SunPKCS11 with NSS when there are external FIPS modules in the NSSDB.
- Update the default enabled cipher suites preference.
- Improve SSL session cache performance and scalability.
- Update Apache Santuario (XML Signature) to version 2.2.1.
Why the security changes in Java could cause multiple runs of a long-running scheduled task:
In any case, in spite of the scheduled task issue, I would strongly advise you to install the latest update of ColdFusion. The pros of being up-to-date outway the cons of the issue by far.
Copy link to clipboard
Copied
We observed this behavior on CF2021 when doing bulk email sending, no scheduled task involved. When we would send, say, a 10,000 email blast via the SMTP service on the local Windows 2019 server, after X number of minutes the CFM doing the email sending would start running again. I could see it happening in FusionReactor monitor in real time. Our solution at the time was running smaller email batches, say, no more than 5000 at a time. We now use a third party cloud based email sender (SMTP2GO) and don't have the issue anymore because their system processes each SMTP transaction much faster.
Copy link to clipboard
Copied
@paule12345, your situation sure sounds quite different that what @pichardov had experienced. In fact, your comment here arrived at the very same minute as theirs, offering their workaround.
As for your saying that a cfm page that created mails started running again, if you are thinking there's a connection there, I'll note that the process in cfmail will by default simply put emails into the cf mail spool folder. Cfmail itself doesn't talk to the smtp server.
Instead it's cf itself (a background thread) that wakes up (by default every 15 seconds) to find any such spooled mail to be delivered, and THAT'S what talks to the smtp server. Any failure or hangup in that process would NEVER (itself) cause the cf page doing the cfmails to run again.
It's great to hear you have FR. With that, you should definetky) definitely be able to track down the "repeat offender", finding such details as what ip address kicked it off, what client user agent that was, what web server processed it, what headers came in from the client or web server, and more.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now