Skip to content

Releases: scylladb/scylla-monitoring

Branch 2.2

19 Mar 08:44
Compare
Choose a tag to compare

New In 2.2

  • CQL optimization dashboard (#471)
  • Unified target files for Scylla and node_exporter (#378)
  • Per machine (node_exporter related) dashboard added to Enterprise (#495)
  • Prometheus container uses the current user ID and group (#487)
  • Kill-all kills Prometheus instances gracefully (#438)
  • Start-all.sh now supports --version flag (#374)
  • Remove the version from the dashboard names (#486)
  • Dashboard loaded from API should have overwrite true (#474)
  • Update alertmanager to 0.16 (#478)
  • Bug Fixes

Moved the node_exporter relabeling to metric_relabeling (#497)

  • Fixed units in foreground writes (#463)
  • manager dashboard was missing UUID (#505)

Branch 2.1

10 Feb 11:08
Compare
Choose a tag to compare

Main changes:

Move to Grafana 5
Use local file for configuration and provisioning
Minor bug fixes

Branch 2.0

26 Dec 08:07
Compare
Choose a tag to compare
scylla-monitoring-2.0

missing closing bracket in dropped view updates

Branch 1.1.0

12 Aug 08:41
Compare
Choose a tag to compare
Branch 1.1.0 Pre-release
Pre-release
disk usage should be per node (#360)

This series set the disk pie-chart usage to be per node, so the repeated
pannel, would show the per server usage.

Signed-off-by: Amnon Heiman <[email protected]>

scylla-monitoring-1.0.0

05 Jul 14:37
Compare
Choose a tag to compare
Adding a new cpu dashboard (#336)

* Adding a new cpu dashboard

Replaces: enhance per server dashboard with useful metrics

Adding a new dashboard that specialized in CPU load
 - Adding a graph with foreground CPU utilization. That is the CPU used by
   request processing, excluding compaction, flushes and other things. The reason for that is that users are usually scared of spikes. Even if we tell them that
   spikes are fine because they are the result of isolatable background processes,
   it is hard to *prove* that without further analysis. This graph will help.

 - time spent in violations: A lot of the latency issues we have, especially in
   higher percentiles come from task quota violations. We have a metric for this
   now and it will help us correlate latency spikes in time

 - Client connections: in the past few months, this is *THE* top metric we
   have been looking at to detect problems. It harms us a lot that it is not
   part of the main dashboard.

In the process of doing the above, I am also doing my best to document the new
graphs. The text will appear in the tooltip in the top left corner of the graph.