ORA-01012: Not Logged On Error after CF2018 upgrade

Report · Jun 25, 2019

After CF2018 Upgrade from CF11, I get an intermittent ORA-01012:Not logged on error. Anyone know what's causing this? Note: The obvious question, am I logged in? Yes, via the Oracle Datasource in CFAdmin.

Report · Jun 25, 2019

That's a pretty obscure one (not one I've seen before, and I follow these forums closely) . Since it's so very specific, I'd recommend first we consider if this may be due to an error in UPGRADING cf2018.

(I know you say you upgraded cf11 to 2018. I don't mean THAT upgrade. As you know--if you did the install--that's not an upgrade itself but a side by side install.)

Instead I mean any most recent update you may have appled to CF 2018 after it was installed. There's a specific update log to check for errors. I have a blog post with more details here:

Having problems after applying a CF update? What to check, and how to recover!

It should take only a minute to check.

If you confirm that's not the issue, let us know and folks can try to dig still further with you. In that care, do let us know what specific version of oracle and what update of cf that you are on. Also are you using the Adobe oracle driver or one you obtained? If the latter, whose and what version is it?

/Charlie (troubleshooter, carehart. org)

Report · Feb 25, 2021

We are having the same issue with intermittent ORA-01012 Not logged on errors. It happens about once a day and then keeps happening until we reset the connection, usually by just saving the datasource page in CFadmin. I wanted to post because in our case, we moved from Oracle 12.1 to Oracle 12.2. The CF side stayed the same. We are on CF2018 2018,0,10,320417 using JDK 11.0.9. We also tried JDK 12.0.4. It seems specific to the connection or session as we have 2 other applications hitting the same database and they don't see any issues when it is happening. I am still working on reproducing it somewhere other than prod. We do not see any cursor limits being hit, and any attempts to lower the cursor limits in test environments leads to the expected ORA-1000 maximum cursor limit error, not ORA-01012. I'll post back when I get to the bottom of it.

Report · Feb 25, 2021

Did you raise the idle time as proposed by the OP in a later reply below, https://community.adobe.com/t5/coldfusion/ora-01012-not-logged-on-error-after-cf2018-upgrade/m-p/104...

/Charlie (troubleshooter, carehart. org)

Report · Feb 26, 2021

Yes, the user having the issue runs with the default profile which has an idle_time of unlimited. We also increased the cursor limit to 5000, but testing during outages show that open cursors, sessions, and processes are all staying in normal ranges during the outages, no where near the limits.

Report · Feb 26, 2021

Can you clarify what you mean by "the user having issues" and how they "run with the default profile"? Are you really referring to the end user? Can you explain how whoever they are would relate to how CF then passes the query to Oracle?

Or were you referrring to the user that is listed as the username in the CF Admin DSN for this Oracle db?

Or might it be that you have NO user listed for the username in the DSN?

I'm only pressing these points to help ensure that you really look at the right things to solve the problem. You could be shining your flashlight right past the culprit. (I have never solved this specific error, so I can do no more than help you look into the darkness with my own flashlight.)

/Charlie (troubleshooter, carehart. org)

Report · Feb 26, 2021

Thanks for the help. To clarify. The application in question has several datasources and there are 3 applications hitting the same oracle database. All cases of the ORA-01012 'Not Logged On' error have been for a single datasource in a single CF application. The 'User' that I am refering to is the 'user name' listed in the CF admin for the Oracle datasource that is experiencing the issue. Running SELECT USERNAME, PROFILE, ACCOUNT_STATUS FROM DBA_USERS; shows that the user has the default profile. The default profile has Idle_time of unlimited as determined by select * from DBA_PROFILES where resource_name = 'IDLE_TIME';

The other applications use the same exact connection info in their CF datasource settings, including the user name. Those applications have never run into this issue. They are under much less load and are not hitting all of the same procs/packages etc. They are also located on a different server. The other apps continue to run fine when we see the ORA-01012 repeatedly in the logs for app1. I should clarify that when experiencing this, most web pages return errors, but a few succeed. I have assumed that there is some sort of limit being reached such as connections, sessions, processes, and that some of the items reaching their limit are being recycled to allow the few successes. It is possible that it is changing connections as well. I have tested those three and they don't seem to be the culprit.

What I would really like to know is what is CF doing under the hood with the database connection loggon. If it recieves 'Not Logged On' as a response, is it just returning that to the application, or is it trying to re-log on? If it is and failing, is that logged anywhere? It seems like Not Logged On would be a frequent response that it would have handling built for. Thanks again.

Report · Feb 28, 2021

Thanks for the clarifications. Problems like this can indeed be frustrating.

As for knowing what's really going on, sadly it's tough. I will say that FusionReactor or the PMT (in CF2018 and above) are able to at leat show things like counts of connections. And they can let you stack trace or profile requests to see what they're doing at a very low level, but it can be hard to connect such info to the resolution of a problem like this.

It is indeed interesting that some requests don't get the error. Consider that it could be that those are using cached queries and so may not be really talking to the database. Again, cf monitoring tools can confirm that.

HTH.

/Charlie (troubleshooter, carehart. org)

Report · Jun 25, 2019

Apparently the Secure Profile tightened security too much. We reinstalled without that option and added our security settings from the cfadmin. Problem seems to have gone away.

Report · Jun 25, 2019

Hmm. This may not be "good news", but I will share it for others who may come across this problem and solution.

Were you aware that you didn't need to "reinstall [CF] without that option"? You could have just gone to the CF Admin Security section, chosen "secure profile", and clicked the checkbox to turn it off.

You mentioned having come from CF11. Perhaps you hadn't noticed it, but that feature was in 11 as well (added then, because in CF10 instead it WAS indeed only an installation option. You may have been operating under that old presumption.)

Glad you got things sorted, of course.

I wonder if it had to do with the limitation of what operations could be done in the datasource (as set in the "advance settings" feature of the CF Admin). If so, then simply editing those settings for the DSN in question may also have been sufficient.

I add this in case you (or someone there) might want to retry turning on secure profile (for its strong security benefits), to see if a change in the DSN lets it work while all the other sec profile features would benefit you. FWIW, note also that everything in the sec profile is an individual CF Admin setting that can be controlled individually, albeit from a separate admin page for each (which could be improved).

/Charlie (troubleshooter, carehart. org)

Report · Jun 28, 2019

This may be an odd recovery action, but to get rid of the ORA-01012 error, we had to make the Oracle idle time setting unlimited, and then we got Open_Cursor limit exceeded. It was 300, we gradually adjusted it to 5000 so prevent the open cursor limit error. We were getting cursor counts in the high 3000s. That seems to be giving us an error-free site.

Yes, a good follow-on action for us would be to find out what is causing cursors to remain open but for now our goal was to return the site to a usable configuration.

Report · Feb 28, 2021

@cbeckwith and @mattInVail

You have observed that the issue:

seems to affect a specific datasource;
occurs only intermittently;
is related to connection idle-timeout

That leads me to wonder whether, for that specific db-user/datasource:

the Oracle driver is creating many more connections than needed;
some of the connections in the connection-pool go stale;
once in a while, ColdFusion selects a stale connection from the connection-pool.

If so, then it might help to:

restrict the maximum number of connections for the specific datasource to, say, 5;
reduce the time interval in which connections are checked, from the default 7 minutes to, say, 2 minutes;
do 1. and 2., experimenting with your own choice of values.

Report · Feb 28, 2021

And if that doesn't help, consider whether perhaps the validation query feature (in the dsn advanced settings) might help.

Problems like this are indeed tough.

/Charlie (troubleshooter, carehart. org)

Report · Mar 01, 2021

That is my feeling as well. I tried editing the settings as you described on a test environment, but in the oposite direction. I would realy like to be able to recreate the issue so that I can know that I am fixing it. Even with large intervals and 2 connections, I wasn't able to force any errors.

While digging through the error logs, I found a lot of java.lang.threadeath errors. I have recently implemented cfthread "terminate" in the code similar to that described here. https://www.bennadel.com/blog/3532-time-boxing-a-cfthread-and-then-terminating-it-if-it-takes-too-lo... . I also read about some of the dangers of how terminate is handled, https://stackoverflow.com/questions/57811387/coldfusion-terminate-cfthread-in-separate-req . This all adds up for what would force connections to not close/recycle correctly. I am currently trying to prove this theory.

Report · Mar 11, 2021

Hey, Matt. Any update on things? Also, are you still thinking the "not logged on" errors are due to "connections not closing/recycling correctly"? And if so, that's interesting to hear that doing cfthread termination as you describe might be an underlying cause. (It would also explain why this is not something that's come up often, at least that I've ever seen.) Would be great to hear if you confirmed things or, better, may have been able to resolve them.

/Charlie (troubleshooter, carehart. org)

Report · Mar 11, 2021

We have not gotten to the bottom of this yet. We continue to see it in prod, but cannot reproduce it in test environments including one that has a full copy of prod code and data. We still plan on changing the cfthread 'terminate' to jgetPageContext().getFusionContext().getUserThreadTask(myThreadName ).cancel() but we haven't moved it to prod yet.

At the moment, I am working on an efective band-aid. I am trying to catch all of the ORA-01012 errors in the application onError() so that I can 1) log details of the moment and 2) programatically reset the datasource to imediately fix the problem.

I am taking my first attempt at using the CFAdmin api for part 2. Seems pretty handy. I can get the array of datasources and loop through them. I am trying to figure out the best way to force it to reset the connection pool, or the whole datasource the same way that clicking save on the CFAdmin datasource edit page resets them. If I can do that, I think that I can fully mitigate the problem while I work on logging the issue to get to the root cause.

I will be sure to post here when I get to the root of this.

Report · Mar 12, 2021

I don't see what cfthread has got to do with it. Unless you're running queries within asynchronous threads. Which is usually troublesome.

Could you share the thread code you suspect?

@mattInVail : >We continue to see it in prod, but cannot reproduce it in test environments including one that has a full copy of prod code and data.

The explanation could simply be the difference in traffic between the two environments. This explanation becomes more likely when threads are involved.

> ... programatically reset the datasource to imediately fix the problem.

> ... using the CFAdmin api for part 2. Seems pretty handy. I can get the array of datasources
> ... and loop through them. I am trying to figure out the best way to force it to reset the connection pool...

That could lead to unpredictable behaviour or incorrect results in your application.

> ... We still plan on changing the cfthread 'terminate' to

> jgetPageContext().getFusionContext().getUserThreadTask(myThreadName ).cancel()

> but we haven't moved it to prod yet.

To programmatically terminate a thread is an exceptional event. Whenever you have to do so, there is a high chance that you can improve the code by other means. In my experience, at least.

Report · Mar 19, 2021

The threads were a wild goose chase. I finally figured out how to reproduce the ORA-01012 error in our test environment.

The CF datasource connects with Oracle user userA. userA is assigned to profile testProfile in Oracle. As dba run "alter profile testProfile limit idle_time 1";
Open a page in the application imediately after change; result=successful dba calls on the page
wait >1 ming.
Call the same page. Result = "Macromedia][Oracle JDBC Driver][Oracle]ORA-02396: exceeded maximum idle time, please connect again".
Call the same page a 3rd time. Resutl = "[Macromedia][Oracle JDBC Driver][Oracle]ORA-01012: not logged on".
All future calls result in ORA-01012.

This seems like a bug to me. Shouldn't CF (or the driver it is using) recognize that it no longer has a valid connection and create a new one? It is possible that it is trying to do this and failing, but I haven't found where that might be logged. I also turned on the "enable connection validation" checkbox in the CFadmin with no effect. I'm guessing that only applies to creating a connection, so it just lets this bad connection continue to be used.

The real scenario is even weirder because the connection to the database doesn't have an idle_limit, but it hits this same scenario when it calls a quiry involving a database link to a database where there is a limit. If that query fails, then all future queries even if they only involve the database with no idle_limit.

Now that I can reproduce the issue, I can do some more effective testing to find a solution.

Report · Mar 20, 2021

This is unpleasant, but honestly not really that surprising. ColdFusion uses JDBC to handle database connections, and each connection is kind of like a bridge with a person on one end (your database) and a person on the other (Java/CF/whatever). When the connection is set up, the Java person walks over the bridge and delivers a message to the database person. Connection successful! But in your case, later on, the database person wanders away for some reason. The Java person has no idea of this, of course, so when he's told to deliver another message, he walks across the bridge but the database person is no longer there. Connection fails! There's no ongoing communication between the two ends of the bridge, just messages sent and received as queries are executed. (I'm sure there's a lot of oversimplification in my example but it should be sufficient to describe the overall situation.)

My immediate recommendation, if this is a big deal and if you haven't already tried it, is just to disable database connections entirely and have CF create a new connection for every query, then tear it down after the query is done. Of course, there's a lot of overhead here and that's why CF administrators don't usually do this. But it should solve your problem here. This kind of problem is exactly why the option exists.

I hope this helps!

Dave Watts, Eidolon LLC

Report · Mar 21, 2021

My thoughts, too. Instructive analogy, Dave.

Report · Mar 21, 2021

Great debugging effort there, @mattInVail !

Report · Mar 22, 2021

Thanks everyone for your help. We did decide to turn off the "maintain connections" setting for this datasource and it did fix the issue. So far the performance impact seems minimal.

Looks like there is a patch out today https://helpx.adobe.com/coldfusion/kb/coldfusion-2018-update-11.html that may address the bug in the driver. I can't find the details, but is listed as fixing Orackle: User Logged Out Error. To use Dave's ananlogy, if you get to the end of the bridge and theres a 'No Trespassing' sign, you should really know to try a different bridge.