Skip to content

Latest commit

Β 

History

History
310 lines (231 loc) Β· 13.7 KB

File metadata and controls

310 lines (231 loc) Β· 13.7 KB

Observable Metrics

Overview

Hoist-core provides a central metrics infrastructure built on Micrometer, enabling applications to publish observable metrics to platforms such as Prometheus, Grafana, and Datadog. The system is designed to work transparently across Hoist's clustered architecture, with automatic namespace prefixing, default tags, and cluster-wide scrape support.

The framework automatically publishes a range of built-in metrics covering JVM health, JDBC connection pooling, WebSocket activity, client activity tracking, and Hoist monitor results. Applications can register their own custom metrics using the standard Micrometer API via MetricsService.registry.

Key capabilities

  • Central registry β€” MetricsService exposes a CompositeMeterRegistry that all meters register through.
  • Export registries β€” built-in support for Prometheus (pull-based) and OTLP (push-based), configured via soft config. Additional registries (e.g. Datadog) can be added programmatically.
  • Cluster-wide Prometheus scrape β€” a single endpoint can return metrics from all instances, each distinguished by a xh.instance tag.
  • Built-in metrics β€” JVM (memory, GC, threads, classloader, CPU), JDBC pool, WebSocket channels, client activity tracking, and Hoist monitor results are instrumented out of the box.
  • Admin Console β€” a cluster-wide metrics viewer is available via MetricsAdminController.

Source Files

File Location Role
MetricsService.groovy grails-app/services/io/xh/hoist/telemetry/ Central Micrometer registry, namespace/tagging, export registries
MetricsConfig.groovy src/main/groovy/io/xh/hoist/telemetry/ Typed wrapper around xhMetricsConfig
MonitorMetricsService.groovy grails-app/services/io/xh/hoist/monitor/ Publishes Hoist monitor results as Micrometer metrics
TrackMetricsService.groovy grails-app/services/io/xh/hoist/track/ Client activity metrics from track log entries
MetricsAdminService.groovy grails-app/services/io/xh/hoist/admin/ Cluster-wide meter listing for admin UI
MetricsAdminController.groovy grails-app/controllers/io/xh/hoist/admin/cluster/ REST endpoint for admin metrics viewer

MetricsService

File: grails-app/services/io/xh/hoist/telemetry/MetricsService.groovy

The central service for all Micrometer metrics in a Hoist application. Initialized early in the bootstrap sequence (before other services), it provides the CompositeMeterRegistry that all framework and application meters register through.

Registry and meter registration

Access the registry via metricsService.registry β€” a standard Micrometer CompositeMeterRegistry that supports all Micrometer meter builders directly. For the common cases, MetricsService exposes a family of methods that handle registration, default tags, distribution config, and name-prefixing from an optional owner BaseService's telemetryPrefix:

Method Use for
configureTimer(name, description?, tags?, percentiles?, slos?, publishHistogram?, minExpected?, maxExpected?, owner?, useNamePrefix?) Configures distribution stats and default metadata for a named Timer (no concrete Timer registered).
registerTimer(name, description?, tags?, owner?, useNamePrefix?) Registers a concrete Timer (uses distribution config from any prior configureTimer).
configureCounter(name, description?, tags?, owner?, useNamePrefix?) Configures default metadata for a named Counter.
registerCounter(name, description?, tags?, owner?, useNamePrefix?) Registers a concrete Counter for the name.
registerGauge(name, valueFn, description?, tags?, baseUnit?, owner?, useNamePrefix?) Registers a Gauge whose value is read from valueFn on demand.
registerFunctionCounter(name, countFn, description?, tags?, baseUnit?, owner?, useNamePrefix?) Registers a monotonically-increasing FunctionCounter from countFn.

Pass owner: this from your service to have its telemetryPrefix prepended to the metric name and an xh.owner tag added automatically. Set useNamePrefix: false to opt out of prefixing when supplying a fully-qualified name.

class MyService extends BaseService {

    String telemetryPrefix = 'myService'
    MetricsService metricsService

    void init() {
        metricsService.registerGauge(
            name: 'queueDepth',
            description: 'Current items in processing queue',
            valueFn: { queueSize() },
            owner: this
        )
    }
}

For meter shapes the above methods don't cover (e.g. DistributionSummary), use the underlying metricsService.registry directly with the Micrometer builder API β€” remember to prefix the metric name yourself, e.g. "${telemetryPrefix}.myMeter".

Default tags

All meters registered through the service automatically receive:

  1. Default tags:
    • xh.application β€” the application code (e.g. myApp)
    • xh.instance β€” the cluster instance name (e.g. e36ca82b)
    • xh.source β€” classifies the metric's origin ('hoist' or 'app')

Cluster-scoped metrics

Metrics tagged with instance=cluster are only accepted on the primary instance. This prevents duplicate registration of cluster-level aggregates (such as overall monitor status) across multiple instances.


Export Registries

Prometheus

When prometheusEnabled: true in xhMetricsConfig, a PrometheusMeterRegistry is added to the composite registry. Prometheus scrapes are served by calling metricsService.prometheusData(), which fans out to all cluster instances via Hazelcast, collects each instance's scrape output, and concatenates the results. Each metric already carries a xh.instance tag distinguishing its source.

Applications expose this via a simple controller:

import io.xh.hoist.BaseController
import io.xh.hoist.security.AccessAll

@AccessAll
class PrometheusController extends BaseController {

    def metricsService

    def index() {
        render(
            contentType: 'text/plain; version=0.0.4; charset=utf-8',
            text: metricsService.prometheusData()
        )
    }
}

This cluster-wide endpoint should be used instead of the spring default, /actuator/prometheus which will not contain any Hoist metrics and is not configured by default.

Additional Prometheus configuration properties can be passed via the prometheusConfig map in xhMetricsConfig. Keys are mapped to Micrometer's PrometheusConfig properties (e.g. {"step": "PT30S"}).

OTLP

When otlpEnabled: true, an OtlpMeterRegistry is added for push-based export (e.g. to Grafana Cloud, New Relic, or any OTLP-compatible backend). Configuration properties are passed via otlpConfig (e.g. {"url": "https://otlp.example.com/v1/metrics", "step": "PT60S"}).

Local-development gating

OTLP export is suppressed by default when the app is running in local development, even when otlpEnabled: true in xhMetricsConfig. This avoids polluting a shared OTLP backend with developer-machine metrics during routine work. The same gating applies to trace export β€” see tracing.md.

To opt in, set the otlpEnabledInLocalDev instance config to 'true'. Local-development detection follows Utils.isLocalDevelopment, which reflects the Grails runtime mode (Environment.isDevelopmentMode() β€” true when started via bootRun, false in a deployed war). This is independent of the configured appEnvironment, so a deployed instance configured as Development is not affected by this flag.

When OTLP export runs in local dev, the deployment.environment.name resource attribute is suffixed with the OS username (e.g. Development-johndoe) so per-developer data can be distinguished in a shared backend. Override ClusterConfig.getOtelResourceAttributes() if your backend prefers a different scheme.

Adding custom registries

Applications can add additional export registries programmatically:

metricsService.registry.add(myDatadogRegistry)

Built-in Metrics

JVM metrics

Automatically bound at startup via Micrometer's standard binders:

Metric prefix Source Description
jvm.memory.* JvmMemoryMetrics Heap and non-heap memory usage
jvm.gc.* JvmGcMetrics Garbage collection counts and pause times
jvm.threads.* JvmThreadMetrics Thread counts by state
jvm.classes.* ClassLoaderMetrics Loaded and unloaded class counts
system.cpu.* ProcessorMetrics CPU usage and available processors

JDBC connection pool metrics

Published by ConnectionPoolMonitoringService via the Tomcat JDBC pool:

Metric Type Description
jdbc.pool.size Gauge Total connections (active + idle)
jdbc.pool.active Gauge Active/in-use connections
jdbc.pool.idle Gauge Idle connections
jdbc.pool.waitCount Gauge Threads waiting for a connection
jdbc.pool.borrowed Counter Cumulative connections borrowed
jdbc.pool.returned Counter Cumulative connections returned
jdbc.pool.created Counter Cumulative connections created
jdbc.pool.released Counter Cumulative connections destroyed
jdbc.pool.reconnected Counter Connections re-established after failure
jdbc.pool.removeAbandoned Counter Connections removed due to abandonment
jdbc.pool.releasedIdle Counter Idle connections released by evictor

WebSocket metrics

Published by WebSocketService:

Metric Type Description
websocket.channels Gauge Active WebSocket channels
websocket.messages.sent Counter Messages sent successfully
websocket.messages.received Counter Messages received from clients
websocket.messages.sendErrors Counter Message send failures
websocket.sessions.opened Counter Sessions registered
websocket.sessions.closed Counter Sessions unregistered

Monitor metrics

Published by MonitorMetricsService after each monitor evaluation cycle on the primary instance. For each configured monitor, three metrics are published:

Metric Type Description
hoist.monitor.status.{code} Gauge Status severity (0=INACTIVE .. 4=FAIL)
hoist.monitor.value.{code} Gauge Current numeric metric value
hoist.monitor.executionTime.{code} Timer Execution time of the monitor check

Each carries a xh.instance tag indicating which cluster instance ran the check, or cluster for aggregate status. Meters are automatically removed when monitors or instances are decommissioned.

See monitoring.md for full documentation of the Hoist monitoring system.

Client activity metrics

Published by TrackMetricsService, which subscribes to the xhTrackReceived Hazelcast topic on the primary instance. These metrics are cluster-scoped (instance=cluster) and tagged with clientApp to distinguish activity from different client applications.

Metric Type Description
xh.client.track.messages Counter All track log entries received
xh.client.track.errors Counter Client error track entries (category == 'Client Error')
xh.client.load.totalTime Timer Total app load elapsed time
xh.client.load.authTime Timer App load authentication phase duration

Load timers are recorded only for App / Loaded track entries that include a timings map in their data payload, confirming they represent a standard Hoist client load event. Both timers emit percentile histograms, supporting server-side aggregation (e.g. p90/p99) in Prometheus and OTLP-receiving backends.

See activity-tracking.md for documentation of the track log system.


Configuration

xhMetricsConfig

Property Value
Type json
Default See below
Client Visible No
Purpose Metrics infrastructure configuration β€” export registries and namespace.

Default value:

{
    "prometheusEnabled": false,
    "otlpEnabled": false,
    "prometheusConfig": {},
    "otlpConfig": {}
}
Key Type Description
prometheusEnabled Boolean Enable the Prometheus export registry. Dynamic β€” takes effect on next config refresh.
prometheusConfig Map Additional Prometheus configuration properties (e.g. {"step": "PT30S"}).
otlpEnabled Boolean Enable the OTLP export registry. Dynamic. In local development, additionally gated β€” see Local-development gating.
otlpConfig Map OTLP configuration properties (e.g. {"url": "...", "step": "PT60S"}).

When xhMetricsConfig is updated, the export registries are torn down and recreated with the new settings. This is handled by clearCaches() responding to the xhConfigChanged event.


Admin Console

MetricsAdminController provides a listMetrics endpoint that fans out to all cluster instances and returns a merged list of all registered meters. Each entry includes:

  • name β€” the fully-qualified metric name (with namespace prefix)
  • type β€” Micrometer meter type (GAUGE, COUNTER, TIMER, etc.)
  • value β€” the current value (interpretation depends on type)
  • count, max β€” for Timer/DistributionSummary types
  • description β€” human-readable description
  • baseUnit β€” unit of measurement
  • tags β€” all tags including xh.application, xh.instance, xh.source
  • stats β€” raw statistics map

This endpoint requires the HOIST_ADMIN_READER role.