Skip to main content
Inspiring
June 27, 2022
Answered

Coldfusion 2018 docker image frequently slow

  • June 27, 2022
  • 5 replies
  • 1585 views

Hi

First post here and very new to CF/Java. I have inherited  a somewhat legacy application that is running on windows/IIS fine. However, I am trying to deploy the same application in a docker container ( linux ) and running into some strange slow down issues.

The application runs fine though it seems that half the time ( or more than that ) the application doesnt seem to be able to cache all classes and compiles them everytime . 

For testing, I remove the container and deploy everything again. Sometimes the system runs fine and is reponsive after deployment. Once its responsive, I can use the application all day without any slowdowns. 

At other times ( more often than not ), when I deploy the container, it slows down from the first request and never picks up speed. It feels as if the classes are being created everytime or being loaded in mem with every click. The difference is on the order of 2 minutes to browse through all links in the application to up to 2 hours for the same sequence to finish when the container is not running properly.

 

There is no load on the server as its only one user. The same code works perfectly fine on our windows deployment. The confusion is the inconsistancy of the issue and no apparant pattern for it. 

 

Does anyone have any idea whats going on ? I am not sure what logs to provide but can do so if someone can point me to whats needed

 

I have tried :

  • downgrading through all versions of the docker images provided ( coldfusion2018 update 12 through 14 )
  • upgrade to the latest Java version
  • tried to update garbage collection algorithm but the server doesnt start with anything other than -XX:+UseParallelGC ( which i think is default )

 

Thanks in advance

    This topic has been closed for replies.
    Correct answer Faraz25024317f5d8

    So it seems this problem was introduce early on in the project with a bad jvm config

    there was an entry in the config with -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true. Removing this seems to resolve the issue

     

    I believe someone familer with jvm options would have found this this issue easily but the random nature of the issue threw me off. hope the solution sticks 🙂

    5 replies

    Faraz25024317f5d8AuthorCorrect answer
    Inspiring
    June 29, 2022

    So it seems this problem was introduce early on in the project with a bad jvm config

    there was an entry in the config with -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true. Removing this seems to resolve the issue

     

    I believe someone familer with jvm options would have found this this issue easily but the random nature of the issue threw me off. hope the solution sticks 🙂

    BKBK
    Community Expert
    June 29, 2022
     

    So it seems this problem was introduce early on in the project with a bad jvm config

    there was an entry in the config with -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true. Removing this seems to resolve the issue

     


    By @Faraz25024317f5d8

     

    While I am glad that this resolves the issue, I am still surprised. You call -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true a bad JVM config. Well, it isn't. It is a good JVM config. 🙂  In fact, if your application does a lot of XML processing, the config will be unmissable.

     

    Now that you mention it, could you please show us all the JVM configs? You say you've moved the application from Windows to Linux. You might have carried over certain JVM settings that are not relevant to Linux. 

    BKBK
    Community Expert
    June 29, 2022

    A test of yet another idea:

    1) Return the following setting back to jvm.config:

    -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true

     

    2) Add to jvm.config the revised setting for Linux random number generation:

    -Djava.security.egd=file:/dev/./urandom

     

    3) Restart ColdFusion.

    Inspiring
    June 29, 2022

    Thanks for the input @Dave Watts . I am not sure how well CF supports linux in general. It could be one of the libraries but would you know if it its actually not CF but the way httpd works with tomcat ? just grasping at straws really becuse dumping execution timestamp in the application code iteslef shows that the application slows down and takes longer to process .

     

    My other guess is the database layer or hibernate creating these random issues. but once again thats grasping at straws

    Charlie Arehart
    Community Expert
    June 29, 2022

    Faraz, I really doubt anyone is going to find the magic bullet for you here, using the modest info you are able to share this way. Again there is far more you can directly and specifically diagnose, but that too can't be shared effectively here.

     

    So if you want to make the problem go away, I can help via remote screenshare. If I can't, you won't pay for time that you don't find valuable. See my first comment for more.

     

    And once we solve it, you could relate as much as you'd like to help others who may find this thread. It may well be just one thing, but the challenge is finding it. And that may not take long, together. 

    /Charlie (troubleshooter, carehart. org)
    Inspiring
    June 29, 2022

    Thanks for the option mate. I will get back to you. 

    We are also in the middle of evaluating lucee so will have to see how that goes. 

    I'll keep at it in the background and post here if I find a solution

    Inspiring
    June 28, 2022

    Thanks @Charlie Arehart and @Dave Watts for your replies

    I should clarify a few things 

    - I only updated to the latest minor version for java . Did not go to java 12

    - please ignore the garbase collection issue. I realised i made a mistake in the settings. However as suggested, updating GC doesnt help

    - when i say compilation, it just shows my ignorance of java/CF . all i mean to say is that seems to be loading everything from scratch

    For diagnostic , I have fusion reactor installed. all i can tell from that is unusual heap mem usage when its slow 

     

    each dip occurs after I click a page and then makes this saw tooth patern untill the page is loaded. however when the server is working fine i get a simple straight line for heap usage

    slightly different when i change the GC to G1 and the server is still slow

     

    Following @Dave Watts suggestion , I installed CF on a fresh EC2 ( amazon linux 2 ) and the exact same thing happenes. The server was responsive and fast for first boot and then slowed down when I restart CF. I did run into a small issue with a few errors but resolved it by setting the heartbeat_interval to 0 as per CF2018 sporadic Crash .

     

    I am fairly comfortable with docker containers and linux though new to Java/CF as mentioned earlier.

     

    Thanks again for any help

    BKBK
    Community Expert
    June 28, 2022

    Hi @Faraz25024317f5d8 , from what you've shared, my thinking is as follows:

    1.  Choosing between ParallelGC and G1GC wouldn't make much difference. In fact, in your case, ParallelGC might even perform slightly better than G1GC. That is because G1GC is optimized for memory-use well above 4 GB, whereas your application is using 3GB or less. So, stay with ParallelGC.
    2.  Ensure that the settings Xms and Xmx are the same. For example, -Xms8192m -Xmx8192m -XX:+UseParallelGC
      This means that the Java Virtual Machine will start with the maximum memory value.
    3.  The inclines in the saw-tooth pattern suggest rising memory usage. ColdFusion would then be creating objects in memory, for example. The dips in the pattern suggest garbage collection.
      So, the saw-tooth could represent a period when your application repeatedly creates objects in memory, which are then garbage-collected. Think, for example, of a function that dynamically populates structs. The more the structs and the larger they are, the higher the memory usage. Suppose such a struct is var-scoped. Then the object representing it will be garbage-collected when the function returns.
      A similar pattern may also result from establishing database connections, running queries and then disposing of the objects afterwards.
      So, the saw-tooth is part and parcel of the workings. You shouldn't worry about it when it is well below maximum memory, as in your case.
    4.  However, as you alsready guessed, the saw-tooth may also be the result of a page being compiled. Compilation uses objects which are subsequently garbage-collected. Assuming that compilation is the issue, then caching might help.
      To test this idea, switch caching on in the ColdFusion Administrator:

       

    Inspiring
    June 29, 2022

    Hi @BKBK , Thanks for the input. all cache settings are properly set. I am not that concerned about the saw tooth in FR as it was just something i noticed. not indicative of anythin i think. The real problem is the nature of the problem thats making it harder to pinpoint. if it were always presenting it would be easier to track. that fact that sometimes randomly it behaves fine is whats confusing

    Community Expert
    June 27, 2022

    My answer will be shorter than @Charlie Arehart 's. It seems like you haven't been able to narrow down the problem very much - this isn't your fault, but basically you're presenting this as "I'm running on Docker and having this problem." I suggest you try a broader set of troubleshooting measures to see if you can find the root cause. What happens if you just run the app on Linux, for testing purposes? Just throw up an EC2 instance in AWS and put it on there, if you can do that - it'll save you a lot of OS configuration stuff. Put your container in a different Docker environment and see what happens.

     

    As for upgrading Java, I recommend upgrading to the latest minor version. This usually isn't listed in the CF docs as a supported Java version, but it's always worked for me. Unfortunately, that doesn't work for major versions. So, for example, if your version of CF supports Java 11.0.14 you can safely upgrade it to 11.0.15 (a minor version) but not to 12 or higher (a major version).

     

    Also, I doubt this has anything to do with compilation. CF actually started as an interpreter, not a compiler, and it doesn't do everything exactly as an ideal compiler would. In fact, you should be able to run CF fine as a single user without compiling anything.

     

    Finally, I have no idea why you wouldn't be able to change the GC algorithm to something other than parallel GC - CF itself does support all kinds of GC options. But you might just want to take a look at the jvm.config and the startup log after you've relaunched in that Docker instance to see if your change picked up.

     

    Dave Watts, Eidolon LLC

    Dave Watts, Eidolon LLC
    Charlie Arehart
    Community Expert
    June 27, 2022

    Faraz, this is certainly an interesting challenge, and there could be many explanations for it. Before I share any "guesses", I'd ask first:

    • though I doubt that changing the Gc algo will necessarily be the solution (it rarely is), there's no reason you can't. So when you say the "the server doesnt start with anything other than" ParallelGC, do you mean the container won't start? That may have more to do with HOW you are changing it. But again I doubt changing it is THE solution. 
    • what makes you think it's about compilation? Even if every request WAS being compiled, I'd not expect that to add MINUTES (or even seconds) to a request. Do you have some evidence to suggest that's the issue?

     

    Since it's sometimes slow from the start, even with "no load", that would seem to suggest there's no cf configuration problem (since it works fine "all day" in some cases) 

     

    I really think the best way to solve your issue is to get diagnostics in place, to tell you WHY any one request is slow, as well as what else is going on when this happens.

     

    And sadly, the logs may or may not help in this case. First, have you looked at ALL the cf logs within the container (those recently modified)? As you may know, the docker logs shows only the equivalent of the coldfusion-out.log and coldfusion-error logs, not the other cf logs). Second, in a case like you're describing, the logs often won't help anyway, as some problems don't lead to anything showing in the cf logs--or they don't show what we need to actually diagnose the problem. 

     

    And as you may know, there are indeed various solutions that can help do that (some better than others, depending on the problem), from traditional Java diagnostic tools, to popular Apms (like Newrelic, Datadog, and Dynatrace), to more cf-specific tools like the cf PMT (new since cf2018) or FusionReactor.

     

    All these can be used with containers also, though there can be new challenges doing that, even for folks familiar with the tools (and with running cf in a container). There are also tools specific to your container runtime (Docker) and to the container and host OS which could help, depending on where the problem is (may be cf, may be docker/container config, may be host resource issues). All that's a lot to consider. 

     

    And I'm not saying one has to assess them ALL. But I'm saying that one would investigate a few key diagnostics, and go from there to consider others. 

     

    But if one may be new to BOTH running cf in containers AND using such diagnostic tools, I'd argue that most folks in that situation would have a really tough time going it alone.  There are just so many variables, from container issues to tool use to interpreting what the tools report. Or they may overlook a vital clue. And there's just no way to convey here all one would need to consider.

     

    Then there are questions of you went about setting up your images/containers. There's no "one way", and any one choice could be at issue here. Of course, the fact that things to change for you midstream is another wrinkle that only complicates the diagnosis. 

     

    So while it may be possible that someone else may chime in with JUST the right answer based solely on what you've shared here (or folks may start tossing out guesses of things to try), I'd recommend given all the above that the fastest and most effective way to resolve your specific problem would be to have me (or someone similarly experienced) to join you in a remote screenshare session (like zoom), where together these diagnostics could be considered.

     

    And it may surprise you to hear that I think we could be done within an hour (maybe less, maybe more), since I do this sort of cf troubleshooting daily. And most folks learn a lot as we go. You can learn about my rates, approach, satisfaction guarantee, and more at carehart.org/consulting. I offer there also my online calendar, with slots today and any day this week or coming ones.

     

    I wish I had just "the answer" for you. If you follow these forums, I do usually reply with that or some specific things to consider. There are just far too many variables in this scenario for me to see any to recommend without the diagnostics above. There is likely ONE problem and solution. The challenge is finding it, and I would look forward to helping solve this for you. 

    /Charlie (troubleshooter, carehart. org)