Copy link to clipboard
Copied
We have an application that uses CFLDAP, port 636 to authenticate user to Active Directory. We are getting the following error: An error has occurred while trying to execute query :xxx.yyy.zzzz:636.
The server is running CF2021 Enterprise, on a Windows 2016 server
I can get it to work randomly rebooting the server or starting/stopping the CF Application service. It might start working on the second, third, fourth reboot, etc. Once it is working it is fine until monthly patch reboots and the failure process starts all over again. We do have a CF2018 server also on Windows 2016 server and do not have the issue.
Here is what I have tried, all with no long-term luck in fixing the issue:
The error output is not very helpful.
No entries in Windows, Apache or CF logs when the error occurs.
CFCATCH doesn’t provide anything useful
I feel like this is cert related but can’t find anyway to further diagnose the actual error above to provide any deeper details.
Thoughts/Suggestions?
Copy link to clipboard
Copied
pardon the typeo's
Copy link to clipboard
Copied
Hi @BeRadB
Can you add this in jvm.config "-Djavax.net.debug=ssl" run the test case again with SSL enabled and share the logs with me.
Copy link to clipboard
Copied
sure but which log files would you like?
Copy link to clipboard
Copied
coldfusion-error.log. You can DM the log because it is going to have a lot more information.
Copy link to clipboard
Copied
BeRadB, thanks for that. So I gather this is indicating you're running Java 11.0.1, and at least that is indeed before the change that blocked access to servers not supporting at least TLS 1.2.
That said, it's not clear if you are saying that you HAVE tried a later Java version. 11.0.1 is from 2018. It may well be that updating to 11.0.10 (before the update preventing access to TLS 1.1) or even 11.0.11 or 12 (the two latest updates) may help you overcome SOME aspect of the problem of CF communicating to your LDAP server.
I also notice you show using Windows 2012. I wonder if there could be any connection with that, though I am inclined to think it should not. Pointing it out for the sake of any following on and wanting to help.
One last thing, for the sake of completeness. It's clear you didn't get the info from a java -version command, like it seemed was what dgcotton was showing. You DO show more than just the jvm info but also CF info.
That said, I must say I can't tell for sure WHERE you got it. Can you tell us? It doesn't match the layout of fields I see in a CF2018 display of that sort of info in the "settings summary page" or the "system information" page (the "i" icon in the top right).
FWIW, in my 2018 display of the former, the first few fields are these:
Server Product ColdFusion
Version 2018,0,11,326016
Edition Enterprise
Operating System Windows 10
OS Version 10.0
Update Level /D:/ColdFusion2018/cfusion/lib/updates/chf20180011.jar
Adobe Driver Version 5.1.4 (Build 0001)
Tomcat Version 9.0.41.0
JVM Details
Java Version 11.0.11
And in the "system info" page they look like this:
Server Product ColdFusion 2018
Version 2018.0.11.326016
Tomcat Version 9.0.41.0
Edition Enterprise
Operating System Windows 10
OS Version 10.0
Update Level D:/ColdFusion2018/cfusion/lib/updates/chf20180011.jar
Adobe Driver Version 5.1.4 (Build 0001)
JVM Details
Java Version 11.0.11
Note how your list of fields (and even the format of some of the values) don't really match what you show. I just find that curious.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
To be clear, I didn't say you needed to provide the whole page. I simply asked where you had gotten the info you shared (with your system's settings) since it didn't match the fields shown in the two approaches I mentioned.
And what you have shared now also does not match what you shared before. Note for instance you now show a line indicating "update level", like mine did. Your previous info did not. So again, just wondering where you got the info for your reply yesterday. Not critical. Just curious.
As for your error log and its ssl debugging, we can see that you are getting "PKIX path building failed". That can have different causes and solutions.
Can you clarify if you've run with cf pointing to a later java than that 11.0.1. I have blogged previously with more about how that alone can solve problems of CF being unable to call out to something via https/tls. Note how it differs from common suggestions to import new certs. That may NOT be needed, though sometimes it may be. Try updating the jvm first. It's helped many.
Finally, note that the failure can happen "suddenly", when "nothings changed" on your end... because of instead a change on the server you're CALLING, rather than yours. And that may require this jvm change in cf.
I realize others here have said that trying a new jvm has not helped, but note how I've pressed to make sure we're really talking about the jvm that cf is using. It's easy to misinterpret or mistate this info, and when we're trying to solve things by words (rather than a live consulting session) it's just critical to make sure we're all on the same page.
But some
Copy link to clipboard
Copied
No worries, will try a lower jvm version. might be a few days to do so, will let you know.
Copy link to clipboard
Copied
I have tried to install an older version of JRE from oracle but when I restart cf2018 I get an error saying it can't start to do to an error. I tried different versions of the JRE but the same thing over and over. The only JRE that works is the one that was shipped in 2018.
Copy link to clipboard
Copied
That's not an unusual problem (that cf won't start after you try to update the jvm it uses). It's like an asserted "minor surgery" that goes wrong because of an infection or other unexpected problem. And I did a blog post addressing the most common reasons and how to avoid or recover from them:
https://www.carehart.org/blog/client/index.cfm/2014/12/11/help_I_updated_CFs_JVM_and_it_wont_start
Let us know if you get going or have questions (or I can help directly, likely in minutes).
Copy link to clipboard
Copied
I have done this before for lower cf versions but never had this issue before. I did notice that in the jre installs i dowloaded from oracle there is no jre folder in the folder structure compared to the default install, wasnt sure if that is correct. I did point the jave home path to the root folder and tryed subfolders as well, but no go. I tryed different versions and all the installs were the same. I also used that post of yours as a check list.
Copy link to clipboard
Copied
Also, should add have tried sending the requests to a load balanced AD server address as well as hardcoding to a specific AD server. This did not resolve the issue.
Again, not changing any code, uninstalling and resinstalling allows it to work for the time being
Copy link to clipboard
Copied
When you say "uninstalling and reinstalling" works for the time being, do you mean all of CF? That's how this reads. And is that when the failing one has been running update 1, and uninstalling the update (alone) did not help? (I am answering these comments in the threaded interface of the UI, so if anyone gets this as an email, my reply here is about a different/later comment from dgcotton yesterday.)
And I hope you, dgcotton, might see my reply just not to your earlier note (where I talk about uninstalling the update 1, if that's been an issue for you).
If you may say that a failing instance even did not yet have update 1, but uninstalling and reinstalling CF "fixed the problem", I would wonder if you had done ANYTHING else on the CF server that failed, after installing it, that perhaps you did not do after RE-installing it. For instance, if you may have run the import wizard or CAR mechanism to pull in settings from some earlier CF instance (which could contribute perhaps to why things failed) the first time, but you maybe did NOT perform that step when you RE installed CF.
Again, we're left to grasp at straws based on the info you and BeRadB are sharing. It may seem that things are deterministic (it works with CF version X and update y, but fails with update z), but my read of this thread is that it's not that simple. I could be wrong.
That's why I'm trying to get you guys to be as specific as possible. I appreciate that it's painful. You guys "just want things to work", and you may expect instead that "Adobe should just fix the problem". I just don't think they (or we trying to help here) can yet put a finger on what is clearly a recreatable problem.
And this sort of challenge is all the MORE difficult when we're talking about your challenges calling into an LDAP server which is almost certainly behind a firewall, so that we can't ourselves try to demonstrate whether WE can call it.
And we really can't reasonably expect that we could possibly even RECREATE the same LDAP server on our own end, as there's so much to the equation of things being "equal" to yours, from the LDAP server version, to the OS version (of the ldap server), to the TLS settings of that server, and so on.
But we will all fight on, because that's what we do here. We want to help, if we can. And knotty problems like this are indeed challenging, for better or worse. (Sometimes, though, there really is no substitute for having someone work with you directly, remotely, on your server, with eyes on things, so that there's less need to ask about things and follow-up with different things based on what's reported. That can of course be really tedious in text.) Still, as you can see, we're trying.
Copy link to clipboard
Copied
Charlie, I feel we are going in circles. With all due respect, and there is a lot of respect here, I have outlined in great detail the stapes I have take to try and reproduce and/or resolve the issue. I have 13+ years working with CF and have installed and upgraded MANY a server over the years.
I am not saying or attributing the failure to the update at all. Simply, pointing out the incremental steps to try and identify when the problem is introduced. I have many many years of troubleshooting under my belt and know that troubleshooting 101 says take things in an incremental fashion.
In my original posts I had shared that the very first production server for cf21 was up and running for some period of time (around 4 months). It was only after a monthly OS patch reboot that the problem first appeared. In thinking that something with one of the OS patches had affected the server, I started working backwards. I did find that uninstalling nothing and trying several reboots (to simply look at what be logged on startup) did I noticed that after a period of random reboots (sometimes 2 maybe even 5) it would start working only until the next monthly reboot and fail again. In an effort to leave the production server alone so as to impact our users, I stood up three other servers (based off an image provided by the IT Team) and very systematically and incrementally setup each one. In all three circumstances AT SOME POINT they would fail. None would fail at the same exact step in the process. I was using as simple of a page with CFLDAP as you can get so that no other coding may have been impacting the operation. The last thing I had tried was uninstalling CF21 on one of the failing servers and running the same installer as I had before. I made no changes in the Apache config as I had left that unchanged on the uninstall of cf21. What results was a failing server previously was now operational. I chose NOT to apply updated one so as to test is some other factor impacted the server. I am presently restarting the server on a regular basis and testing the app to see if it fails - it has not. I do know on the other failing servers I tried to uninstall update 1 and it made no impact.
I have tried both install and uninstalling update 1 via the UI and command line - neither approach made a difference in the outcome.
I have worked with our Active Directory architect and include the results there. I have added the verbode logging to the jvm config file and include those. I have tried multiple versions and "brands" of the JRE all along verifying the path matched what was in the consle UI.
I have found NO pattern yet to feel confident about that the root cause is close to being understood.
In all circumstances of a failed instance there is noting in the CF logs, the services start without issue, the apps work without issue minus the ldap calls. It has been an extremely frustrating effort.
I am more than willing to have a screen share session as I shared with Adobe in my ticket to them but no response.
please feel free to reach out to me outside of this thread to arrange a screen share session if you wish. My email is dgcotton@ucdavis.edu
- Dan
Copy link to clipboard
Copied
Sorry you feel things are going in circles. Perhaps others may be more helpful here, and perhaps some of your wrap-up here may be helpful to folks as well. It could be that you've concluded correctly that there's no pattern at all, and in that case it may be hard for Adobe to resolve the problem, unless they somehow luck into recreating it--or perhaps with info from BeRadB, ot others.
Still, I'll reach out directly. Again sometimes things can just go more smoothly directly. And we may even connect a dot that you have not, even with all your experience. It happens for me daily in my work with others of such caliber. Sometimes just having a second set of eyes on a problem can be valuable, as I'm sure you've experienced.
Of course, we may not find anything new. Your note suggests you really have considered all possibilities. To be clear I don't expect folks to pay for time with me that they don't find valuable. So you have nothing to lose then perhaps an hour of our looking into things together.
On the off chance that you may not readily see this, I'll email you next (especially if you may want to meet today even), but for others following along, you can arrange that via my online calendar on my consulting page, at carehart.org (with info on my rates, approach, satisfaction guarantee, and more).
Finally, to be clear, I don't work for Adobe, so if they have not reached out to help, my reaching out is unrelated to them. Perhaps they may as well, especially if somehow I don't offer much help.
Copy link to clipboard
Copied
Update: I have engagaed with an Adobe technical rep, we have not yet found the root cause of the issue but will post back to this thread once we do.
Copy link to clipboard
Copied
Hey Dan, any update on that help Adobe offered last week?
Copy link to clipboard
Copied
Nothing yet
Copy link to clipboard
Copied
did you ever find a solution to the problem with CFLDAP?
Copy link to clipboard
Copied
Adobe has provide a hot fix, suggested installing JDK11.0.13, requested additional debug settings to the jvm.conf file and applying update #2 for 2021. None of these have addressed the issue.
100% of the time if I restarted the server (Windows 2016), the application using secure port for CFLDAP, will fail to authenticate users. Roughly 75% of the time, if I just restart the CF application service, the application will properly authenticate
Copy link to clipboard
Copied
@Dgcotton , Hats off to you, man. You have shown great patience in persevering with this cfldap issue.
I just got an idea. Consider it a test or proof of concept. If only for the sake of ruling things out.
We've been talking updates and upgrades right from the beginning of this thread. Namely, of ColdFusion, TLS and Java. We've so far ignored the one silent partner always lurking in the background: Windows Server 2016. What if some part of the server is too old for the changes we're making?
The idea to test:
-Djdk.tls.client.protocols=TLSv1.2,TLSv1.1,TLSv1.0 -Dhttps.protocols=TLSv1.2,TLSv1.1,TLSv1.0
Restart ColdFusion..
Do your CFLDAP thing. Does it work?
Copy link to clipboard
Copied
@BKBK Thanks. Its been tough. Fortunatly we haven't converted to CF2021 across the board and am able to still run CF2018 without issue.
I will see if our DevOps team can stand up a Windows 2019 server and report back the findings
Copy link to clipboard
Copied
i have also faced the same issue.
Copy link to clipboard
Copied
don't know how to resolve it. surveyzop.com tellculvers
Copy link to clipboard
Copied
@cartlon219870490rg3 What are your server and system specs? Similar to mine? What OS are you running?