A collection of dashboards and their component parts for cloud service provider SaaS services.
-
Create a new
.libsonnetfile on thecsp-mixin/signalsfolder that will contain the general settings of your dashboard and its panels like title, description, panel queries, legend template, discovery metric, variable definitions, aggregation level... -
Define variables definition by setting
groupLabelsandinstanceLabelsin your signal. You can use the global values defined onazureconfig.libsonnetorgcpconfig.libsonnetor override them as needed. Ex. You can initialize groupLabels like:groupLabels: this.groupLabelsor override it like:groupLabels: ['job', 'resourceName']. -
Add the new signal definition to the
config.libsonnetfile. Ex.azurevm: (import './signals/azurevm.libsonnet')(this),. -
Create a new
.libsonnetfile on thecsp-mixin/panelsfolder containing the configuration for each panel. Example:[panel_name]: this.signals.[panel_id].[signal_id].as[visualization_type]() + commonlib.panels.generic.[visualization_type].base.stylize(), # Example: avm_instance_count: this.signals.azurevmOverview.instanceCount.asStat() + commonlib.panels.generic.stat.base.stylize(), avm_cpu_utilization: this.signals.azurevm.cpuUtilization.asTimeSeries() + commonlib.panels.generic.timeSeries.base.stylize(),Use asPanelMixin when you want to show multiple queries on the same panel. Example:
this.signals.azurevmOverview.diskReadOperations.asTimeSeries() + commonlib.panels.generic.timeSeries.base.stylize() + this.signals.azurevmOverview.diskWriteOperations.asPanelMixin()Note: You can prefix the definition with the first letter of the provider and the first letter of the dashboard name you want to build. Ex.
avm_for Azure Virtual Machine. -
Add all your row definitions into the
rows.libsonnetfile. You need to define:- The title of the row if you have one.
- The panel(s) that you want to show.
- The width and height of each panel
See example below:
avm_overview: [ # the row definition is optional. You can show all panels together without to group them by row. g.panel.row.new('Overview'), this.grafana.panels.azurevm.avm_instance_count + g.panel.timeSeries.gridPos.withW(12) + g.panel.timeSeries.gridPos.withH(5), this.grafana.panels.azurevm.avm_availability + g.panel.timeSeries.gridPos.withW(12) + g.panel.timeSeries.gridPos.withH(5), ... ] -
Add your dashboard definition to the
dashboards.libsonnetfile. You can define a unique signal for gcp and azure dashboard or specify a dashboard just for one of the providers (azure or gcp). See example below:+ if csplib.config.uid == 'azure' then { [csplib.config.uid + '-virtualmachines.json']: local variables = csplib.signals.azurevm.getVariablesMultiChoice(); g.dashboard.new(csplib.config.dashboardNamePrefix + 'Virtual Machines') + g.dashboard.withUid(csplib.config.uid + '-virtualmachines') + g.dashboard.withTags(csplib.config.dashboardTags) + g.dashboard.withTimezone(csplib.config.dashboardTimezone) + g.dashboard.withRefresh(csplib.config.dashboardRefresh) + g.dashboard.timepicker.withTimeOptions(csplib.config.dashboardPeriod) + g.dashboard.withVariables(variables) + g.dashboard.withPanels( g.util.grid.wrapPanels( csplib.grafana.rows.avm_overview ) ), } else {}, -
Run the following command from the
csp-mixinfolder to generate all the dashboards injsonformat and then import the one you are building on any instance. Check the foldercsp-mixin/dashoutthat will contain all the generated dashboards:mixtool generate dashboards -J vendor -d dashout mixin.libsonnet -
Lint and fix the files modified executing the following command from the root folder:
make fmt
- GCP - https://cloud.google.com/monitoring/api/metrics_gcp#gcp-storage
- The
quotametrics are alpha, and don't seem to be getting fetched by Grafana Alloy, even when enabled as a metrics prefix. Perhaps this needs to be enabled for a project? replicationmetrics are beta. The only metric which is being retrieved by alloy isreplication/meeting_rpowhich is consistently 1 for all buckets. It may not make sense to graph these metrics, but perhaps it's useful to have an alert?storagemetrics (object_count, total_bytes), have a "v2" which is beta. As such, this lib is using the (implied) v1 metrics which are GA.- There are no latency metrics available
- The
- Azure - https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-storage-storageaccounts-blobservices-metrics
Availabilityis an available metric. It may not make sense to graph this, but perhaps it is useful to have an alert?- There are latency metrics. In our test environment there is very little (no?) traffic. What I have observed is that E2E latency, and server latency is the same value in our limited dataset. Perhaps this should only show a delta, I.E. if E2E is greater than server.
- Network throughput (ingress/egress) metrics for azure are gauges, not counters. Right now, the promql uses rate, which produces "odd" results. The other option, using
derivproduces negative values with the available data, which is also suboptimal. We could just put the raw gauge value on the timeseries, and call it a day. 🤔
- AWS - TODO