Highlighted

Performance Monitoring Toolset keeps crashing

Explorer ,
Jul 18, 2019

Copy link to clipboard

Copied

I've been having lots of problems with the Performance Monitoring Toolset since putting my CF2018 Enterprise intranet into production about a month ago.  I have the PMT installed on my Test/Dev server (Windows Server 2019, 64Gb RAM, running a Development instance of CF18 plus the PMT), and it's monitoring my Dev site plus my production site on another server (same hardware/setup).

What seems to keep happening every couple days or so is that the Elasticsearch datastore services that underlie the PMT are crashing, which initially makes it impossible to login.  When I find that, I go to restart the Datastore service, then the PMT service itself, but the PMT service won't start because it fails to connect to the Datastore.  After several attempts, I usually get frustrated and uninstall and reinstall the whole thing, which is clearly not a viable long-term solution.

The datastore/logs files show stuff like:

org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

java.lang.OutOfMemoryError: Java heap space

I know the latter, so on my most recent install, I bumped the heap size up to 4Gb, thinking maybe it's just not enough to keep track of my workload (seems unlikely).

The PMT logs show this when it won't start:

[ERROR] 2019-06-27 11:45:52.658 com.adobe.pms.es.client.ElasticSearchClient - Datastore Service not available. Retrying to connect...

[ERROR] 2019-06-27 11:46:27.884 com.adobe.pms.es.client.ElasticSearchClient - Datastore Service not available. Shutting down Performance Management Suite...

Even though the Datastore service seems to be running and its own logs show it having recovered.

I was hoping there'd be more mentions of these issues online or a patch, but I can't find anything related to it.  It's concerning because I keep finding myself needing to be able to see what's running on the Production server and the PMT is down and so I have no visibility to my server's active jobs.  Seems like the PMT was put together a bit hastily on a new stack of technologies that don't exactly work well together, and not having the Server Monitor any longer leaves us a bit blind as to what our servers are doing.

If anybody's had similar experience with the PMT or any insight on how to manage it, I'd sure appreciate it.

Views

269

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Performance Monitoring Toolset keeps crashing

Explorer ,
Jul 18, 2019

Copy link to clipboard

Copied

I've been having lots of problems with the Performance Monitoring Toolset since putting my CF2018 Enterprise intranet into production about a month ago.  I have the PMT installed on my Test/Dev server (Windows Server 2019, 64Gb RAM, running a Development instance of CF18 plus the PMT), and it's monitoring my Dev site plus my production site on another server (same hardware/setup).

What seems to keep happening every couple days or so is that the Elasticsearch datastore services that underlie the PMT are crashing, which initially makes it impossible to login.  When I find that, I go to restart the Datastore service, then the PMT service itself, but the PMT service won't start because it fails to connect to the Datastore.  After several attempts, I usually get frustrated and uninstall and reinstall the whole thing, which is clearly not a viable long-term solution.

The datastore/logs files show stuff like:

org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

java.lang.OutOfMemoryError: Java heap space

I know the latter, so on my most recent install, I bumped the heap size up to 4Gb, thinking maybe it's just not enough to keep track of my workload (seems unlikely).

The PMT logs show this when it won't start:

[ERROR] 2019-06-27 11:45:52.658 com.adobe.pms.es.client.ElasticSearchClient - Datastore Service not available. Retrying to connect...

[ERROR] 2019-06-27 11:46:27.884 com.adobe.pms.es.client.ElasticSearchClient - Datastore Service not available. Shutting down Performance Management Suite...

Even though the Datastore service seems to be running and its own logs show it having recovered.

I was hoping there'd be more mentions of these issues online or a patch, but I can't find anything related to it.  It's concerning because I keep finding myself needing to be able to see what's running on the Production server and the PMT is down and so I have no visibility to my server's active jobs.  Seems like the PMT was put together a bit hastily on a new stack of technologies that don't exactly work well together, and not having the Server Monitor any longer leaves us a bit blind as to what our servers are doing.

If anybody's had similar experience with the PMT or any insight on how to manage it, I'd sure appreciate it.

Views

270

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Jul 18, 2019 0
Adobe Community Professional ,
Jul 28, 2019

Copy link to clipboard

Copied

From what I've seen online, the combination of Out-Of-Memory and  "org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed" suggests that your installation has too many shards. Get the Elasticsearch properties, for example, using cfhttp-get. See, for example,

https://discuss.elastic.co/t/elasticserach6-1-1-restart-and-i-got-all-shards-failed/117155

According to one of the links,

How many shards should I have in my Elasticsearch cluster? | Elastic Blog

"A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards"

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 28, 2019 1
Explorer ,
Jul 29, 2019

Copy link to clipboard

Copied

Thanks, that's helpful.  I think I stumbled into this solution (at least temporarily) when I managed to increase the JVM heap size from 2Gb to 4Gb (found I had to use ColdFusion2018PerformanceMonitoringToolset\datastore\bin\elasticsearch-service.bat manager GUI to change that value), which I guess gives me room for more shards.  PMT has stayed running now for 6 days.  I am seeing older indexes, so I'll see how archiving works, but for now, it seems stable.  Hopefully soon there will be more information on how to better manage this tool, since it's pretty critical to enterprise operations running CF2018.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 29, 2019 1
Adobe Community Professional ,
Jul 30, 2019

Copy link to clipboard

Copied

jarviswabi  wrote

...I managed to increase the JVM heap size from 2Gb to 4Gb (found I had to use ColdFusion2018PerformanceMonitoringToolset\datastore\bin\elasticsearch-service.bat)

I agree with an increase in heap size. But do you mean perhaps C:\ColdFusion2018PerformanceMonitoringToolset\datastore\config\jvm.options ?

I think that that is the best place to modify the heap size. Just change the settings from

-Xms2g

-Xmx2g

to

-Xms4g

-Xmx4g

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 30, 2019 0
Explorer ,
Jul 30, 2019

Copy link to clipboard

Copied

No, that's where I first tried changing it, but it seems like Elasticsearch wasn't honoring those settings.  Kept starting at 2G and never followed the 4G.  I found some reference to using the GUI in a Windows installation (sorry I misplaced the link), and that did the trick.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 30, 2019 0
Adobe Community Professional ,
Jul 30, 2019

Copy link to clipboard

Copied

Strange. I changed my settings in jvm.options to

-Xms4g

-Xmx4g

restarted the Windows services for PMT and PMT Datasource Service - and it worked!

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 30, 2019 0
Adobe Community Professional ,
Jul 30, 2019

Copy link to clipboard

Copied

Guys, the difference you see may be based on how you start the PMT and/or datastore, whether as a service (perhaps in BKBK's case) or from the command line (in jarviswabi's case).

Can you each confirm?

Moving on, and FWIW, I'll add also that when having troubles with ElastiSearch or the PMT itself, it may be useful to note that both can be monitored by FusionReactor, since like CF they are Java-based. It might seem rather meta, to monitor the CF monitor. 🙂 But I have seen value in it. (And note that FR has a 14-day free trial if you want to give it a go.)

And you may find value in keeping both FR and the PMT for monitoring CF, as each does things the other does not--and yes, it's OK to run both at once on CF.

Finally, for anyone using the PMT, do make sure it is the latest version. See my post:

Running the CF 2018 PMT? Have you manually applied the recent update to it?

/Charlie (server troubleshooter, carehart.org)

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 30, 2019 0
Explorer ,
Jul 30, 2019

Copy link to clipboard

Copied

Thanks Charlie.  I'm actually running the PMT as services (both the main one and the datastore service), but it still did not obey my changes to the jvm.options file.  I did confirm that I had originally downloaded and installed the latest updated version per your post, which was very helpful.  I'll remember to keep a close eye out for future updates.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 30, 2019 0
Adobe Community Professional ,
Jul 30, 2019

Copy link to clipboard

Copied

Ah, I missed that you were running the elasticsearch-service.bat. I thought you were running just the elasticsearch.bat. My bad.

And as you mention the UI, I assume you mean you passed the manage arg to that bat, which shows the UI for managing the service (which may be familiar to those who have run the similar tool for managing Tomcat services, or Lucee as built on Tomcat and running as a service), in Windows.

It is indeed interesting to see BKBK saying he DID affect the ES (when started as a service) by editing that jvm.options file. It will be interesting to know more about that.

Indeed, I will throw out there that while the takeup of the PMT has indeed been rather slow--at least based on the relative paucity of questions being asked and observations being shared about it--it should at least be of some benefit for folks SEEKING help to realize that at least with respect to the datastore (Elasticsearch), it seems just a pretty much bone-stock implementation of it (as far as I can tell).

So if you need help with something, and find you can't discover the problem in Adobe's docs and resources (or in the CF community, do just look into more generic ES resources and docs for help--and do share as you may learn something that CFers would benefit hearing about working with it. 🙂

/Charlie (server troubleshooter, carehart.org)

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 30, 2019 0
Adobe Community Professional ,
Aug 04, 2019

Copy link to clipboard

Copied

I looked into this further. The question of the JVM settings for Elasticsearch is more complex than I at first thought.

First, let's do away with a logical fallacy. My PMT installation was already working properly. So if it continued working after I changed the Xms and Xmx settings in jvm.options, that wouldn't mean that such a change had an effect. In fact, when I examined the file C:\ColdFusion2018PerformanceMonitoringToolset\datastore\logs\elasticsearch-service-x64-stdout.2019-08-04.log afterwards, I noticed that the settings in use were still Xmx=2g and Xms=2g.

Apparently, neither jvm.options nor elasticsearch-service.bat should be used to set the values of Xmx and Xms. The purpose of the file elasticsearch-service.bat is to create the start-up environment required during the Elasticsearch installation. The contents of the file elasticsearch-service.bat actually suggest that

  1. If JAVA_HOME is defined as an environment variable, the Elasticsearch installation will use that JVM;
  2. You may define the Xmx and Xms values as the environment variable ES_HEAP_SIZE.
  3. The start-up process writes the file jvm.options to disk.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 04, 2019 0
Community Beginner ,
Jul 29, 2020

Copy link to clipboard

Copied

All,

 

I had been battling this one for a bit.  I was quite confused but figured it out.

 

System details

  • Running CF18 on Windows Server 2016.
  • Running the latest version of PMT as Charlie recommended

 

Problem

I was expericing non-stop "com.adobe.pms.es.eshealth.ESStatCalculator - Heap utilization for Elasticsearch node PmV4wph is 91%" and then "com.adobe.pms.es.eshealth.ESStatCalculator - Elasticsearch node PmV4wph recovered from high Heap utilization" alerts.

 

What I tried to fix it

  1. I tried to increase the MAX JVM heap as indicated in the "ColdFusion2018PerformanceMonitoringToolset\datastore\config\jvm.options" file to -Xms4 and -Xmx4.
  2. Toggled the ColdFusion 2018 Performance Monitoring Toolset Datastore Service multiple times and monitored the RAM consumption using Task Manager for the "Commons Daemon Service Runner" but it never went beyond ~2GB.
  3. I tried as far as deleting the jvm.options file.  Didn't matter.  The service doesn't use it.

 

What fixed it

FINALLY found a stackoverflow blog:  https://stackoverflow.com/questions/28798845/how-to-set-memory-limit-to-elasticsearch-in-windows

  1. From a command prompt.  Navigated to the appropriate datastore\bin directory and ran the following: elasticsearch-service.bat manager
  2. Went to the Java tab
  3. Updated the Initial and Max memory pool sizes input boxes.
  4. Applied.
  5. Restarted the service.
  6. Check back in the Task Manager and the Commons Daemon Service Runner is now at 4 GB!! Voila!

 

I hope this helps someone out there!  I struggled for a bit with it!!

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 29, 2020 1
Community Beginner ,
Jul 29, 2020

Copy link to clipboard

Copied

Oh and it looks like the version of Elastic Search PMT is running is Kibana 5.6.3 for those that are curious.  At the time of this writing, Elastic Search is at 7.8.1.

 

That being said, no way CF can package up a 2020 Elastic Search version in the CF 2018 release which I totally get.  Just a point to note.  Not holding anything agaist CF. 

 

Kibana was released on October 10, 2017 so that was most likely a current version when they packaged up PMT.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 29, 2020 1
Adobe Community Professional ,
Jul 29, 2020

Copy link to clipboard

Copied

Great stuff. Thanks, Beeker, for both this and your next comment.

/Charlie (server troubleshooter, carehart.org)

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 29, 2020 0