metrics-governor

metrics-governor is a high-performance metrics governance proxy for OTLP and Prometheus Remote Write. Drop it between your apps and your backend to control cardinality, transform metrics in-flight, and scale horizontally — with zero data loss.

Any pipeline. Any backend. On-prem or cloud. Whether you're shipping metrics to Prometheus, Grafana Cloud, Datadog, Splunk, VictoriaMetrics, or any OTLP-compatible backend — metrics-governor sits in front and gives you governance powers that no collector, agent, or vendor provides out of the box.

Two native pipelines. Zero conversion. Zero allocation. OTLP stays OTLP. PRW stays PRW. Each protocol runs its own receive-process-export path with full feature parity, no conversion overhead, and zero-allocation serialization via vtprotobuf.

What's New

v1.2.0 — LLM/GenAI metric governance — Token budget tracking, gen_ai.* metric governance with limits rules, per-model/provider visibility. First Prometheus-native proxy to govern LLM observability metrics. Details
v1.0.1 — Memory optimization — GOGC tuning (200→100) + Green Tea GC + reduced buffer/queue allocation. Memory at 50k dps dropped 48% (37.5%→19.5%) with only +0.19pp CPU. Memory budget metrics added for operational visibility. Details
v1.0 stable release — All 15 deprecated CLI flags, legacy sampling metrics, and backward-compatibility shims removed. Clean, unified API surface.
vtprotobuf integration (v0.44) — Zero-allocation protobuf marshal/unmarshal via PlanetScale vtprotobuf with sync.Pool message reuse. Measured <1% CPU at 100k dps.
Pipeline performance (v1.0.1) — Lock-free atomic counters, single-shot zstd, pooled compression. Stats full-mode now viable for production.
3,100+ tests — Comprehensive coverage including race detector, vtprotobuf integration, and parity tests across all packages.

Migrating from v0.x? All deprecated flags have replacements — see DEPRECATIONS.md for the full migration table.

The Cardinality Problem — And Why It's Still Unsolved

Metric cardinality is the silent budget killer in observability. Every distinct combination of metric name and label values creates a separate time series. One unbounded label — a user ID, a request path, an ephemeral container name — can turn a single counter into millions of series, crushing your storage backend and exploding your costs.

What's missing across the industry is governance in transit — intelligence between your apps and your backend that knows who the offenders are, protects everyone else, and escalates gradually instead of cutting blindly. That's what metrics-governor does.

Comparison: Open-Source Collectors & Agents

How metrics-governor compares against the most common open-source metrics collectors and agents:

Feature	metrics-governor	OTel Collector	Grafana Alloy	vmagent	Vector	Prometheus	Cribl Stream
Cardinality Governance
Adaptive limiting (drop only top offenders)	✅	❌	❌	❌	❌	❌	❌
Tiered escalation (log→sample→strip→drop)	✅	❌	❌	❌	❌	❌	❌
Per-group / per-tenant quotas	✅	❌	❌	❌	❌	❌	⚠️
Dry-run mode for limits	✅	❌	❌	❌	❌	❌	⚠️
Dead rule detection	✅	❌	❌	❌	❌	❌	❌
Rule ownership labels (team routing)	✅	❌	❌	❌	❌	❌	❌
Processing
Static filter / drop	✅	✅	✅	✅	✅	✅	✅
Label transform (rename, regex, add/remove)	✅	✅	✅	✅	✅	✅	✅
Downsample (per-series temporal compression)	✅	❌	❌	⚠️	❌	❌	❌
Cross-series aggregation (avg, sum, p95)	✅	⚠️	⚠️	✅	⚠️	⚠️	⚠️
Classify (derive ownership labels)	✅	❌	❌	❌	❌	❌	⚠️
Pipeline
OTLP native (gRPC + HTTP)	✅	✅	✅	⚠️	⚠️	⚠️	✅
PRW native (no conversion)	✅	⚠️	⚠️	✅	✅	✅	✅
Persistent queue / zero data loss	✅	⚠️	⚠️	✅	✅	⚠️	✅
Consistent hash sharding	✅	❌	⚠️	⚠️	❌	❌	⚠️
Circuit breaker / backpressure	✅	⚠️	⚠️	⚠️	✅	⚠️	✅
Observability
LLM/GenAI metric governance	✅	❌	❌	❌	❌	❌	❌

Legend and notes

✅ Fully supported — ⚠️ Partial or limited — ❌ Not available
vmagent OTLP: experimental ingestion since v1.93+, primarily PRW-focused
vmagent downsample: stream aggregation provides time-based aggregation, not per-series compression algorithms (LTTB, SDT, CV-based)
vmagent sharding: requires external hashmod relabeling across multiple instances
OTel Collector PRW: available via contrib receiver/exporter, involves internal conversion
OTel Collector aggregation: groupbyattrsprocessor provides basic grouping, not full statistical aggregation
OTel Collector persistent queue: file_storage extension, limited compared to dedicated disk queue
Grafana Alloy sharding: clustering mode with hash ring distribution
Vector OTLP: source and sink available, later addition to the platform
Vector aggregation: aggregate transform provides interval-based reduction, limited cross-series operations
Prometheus OTLP: receiver available since v2.47+, recording rules provide aggregation (not in forwarding path)
Prometheus persistent queue: WAL-based remote write queue, limited durability guarantees
Cribl Stream quotas: routing by source/destination, not per-metric-group adaptive enforcement
Cribl Stream classify: data classification available, not metrics-ownership-specific

Comparison: Vendor Cardinality Management

How metrics-governor's in-transit governance compares against vendor-side cardinality management solutions:

Feature	metrics-governor	Datadog MwL	Grafana Adaptive Metrics	Splunk MPM	Chronosphere	New Relic
Where it runs	In transit (your infra)	Backend (SaaS)	Backend (SaaS)	Backend (SaaS)	Backend (SaaS)	Backend (SaaS)
Open source	✅	❌	❌	❌	❌	❌
Reduces volume before shipping	✅	❌	❌	❌	⚠️	❌
Adaptive limiting (top offenders only)	✅	❌	⚠️	❌	⚠️	❌
Tiered escalation	✅	❌	❌	❌	❌	❌
Tag allowlist / blocklist	✅	✅	✅	✅	✅	⚠️
Per-group / per-tenant quotas	✅	❌	❌	❌	✅	❌
Unused dimension detection	⚠️	✅	✅	✅	✅	⚠️
ML-based recommendations	❌	❌	✅	❌	⚠️	❌
Downsample / aggregate in-transit	✅	❌	❌	⚠️	❌	❌
Dead rule detection	✅	❌	❌	❌	❌	❌
Works with any backend	✅	❌	❌	❌	❌	❌
No vendor lock-in	✅	❌	❌	❌	❌	❌
Self-hosted / on-prem	✅	❌	❌	❌	⚠️	❌
LLM/GenAI metric governance	✅	❌	❌	❌	❌	❌

Legend and notes

✅ Fully supported — ⚠️ Partial or limited — ❌ Not available
Datadog Metrics without Limits: Decouples ingestion from indexing — all data is ingested (and billed), you choose which tags to keep queryable. Does not reduce data in transit.
Grafana Adaptive Metrics: ML-based recommendations for tag aggregation in Grafana Cloud. Suggestions only — requires manual approval. Cloud-only, not available on-prem.
Splunk MPM: Dimension utilization ranking (R0-R5), aggregation rules. Available in Splunk Observability Cloud only. Aggregation reduces stored MTS but doesn't reduce ingest volume.
Chronosphere: Control plane with aggregation rules and quotas. Available as SaaS and on-prem (limited). Reduces stored data but relies on Chronosphere's storage.
New Relic: Drop rules and data management. Limited cardinality-specific controls compared to dedicated governance tools.
metrics-governor unused dimension detection: Dead rule detection tracks stale rules; per-metric stats in full mode tracks cardinality per metric. Not ML-based discovery.

Universal Governance for Mixed Environments

Whether you're running legacy Prometheus Remote Write, migrating to modern OpenTelemetry, or operating both in parallel — metrics-governor provides a single governance layer across all your metrics traffic.

Bridge old and new — adopt OTel incrementally while maintaining full control over existing Prometheus infrastructure
Same rules, same protection — cardinality limits, processing rules, and alerting work identically across both protocols
Single pane of governance — one proxy, one config, one set of dashboards for your entire metrics pipeline regardless of protocol mix

Why metrics-governor?

Challenge	How metrics-governor Solves It
Cardinality explosions crush your backend	Adaptive limiting identifies and drops only the top offenders — well-behaved services keep flowing
All-or-nothing enforcement kills good data	Tiered escalation with graduated responses: log → sample → strip labels → drop
Raw volume too high for storage budget	Processing rules sample, downsample, aggregate, classify, transform, or drop metrics before they leave the proxy
Storage explosion from a noisy tenant	Multi-tenancy with per-tenant quotas and adaptive limits — detect tenants, enforce budgets, protect storage without blanket-dropping
No team accountability for metric costs	Rule ownership labels attach `team`, `slack_channel`, `pagerduty_service` to any rule for Alertmanager routing
Data loss during backend outages	Always-queue architecture with circuit breaker, persistent disk queue, and exponential backoff — zero data loss by default
Single backend can't keep up	Consistent sharding fans out to N backends via K8s DNS discovery with stable hash routing
No visibility into the metrics pipeline	Real-time stats, 13 production alerts, Grafana dashboards, and dead rule detection
Unpredictable costs from runaway services	Per-group tracking with configurable limits, dry-run mode, and ownership labels for team routing
Need team/severity labels derived from business values	Transform rules — build `severity`, `team`, `env` from metric names and label values
Stale rules pile up unnoticed	Dead rule detection tracks last-match time for every rule, with alerts for stale cleanup
Complex deployment planning	Interactive Playground generates Helm, app, and limits YAML from your throughput inputs

Architecture

View as text diagram (Mermaid)

flowchart LR
    subgraph Sources["&nbsp; Sources &nbsp;"]
        S1["OTLP gRPC / HTTP\nApps · Agents · SDKs"]:::source
        S2["PRW 1.0 / 2.0\nPrometheus · Grafana Agent"]:::source
    end

    subgraph MG["&nbsp; ⚡ metrics-governor &nbsp;"]
        direction TB
        subgraph OTLP["&thinsp; OTLP Pipeline &thinsp;"]
            direction LR
            O1(["Receive"]):::rx --> O2(["Process"]):::proc --> O3(["Limit"]):::limit
            O3 --> O4(["Queue"]):::queue --> O5(["Prepare"]):::prep --> O6(["Send"]):::send
            O6 -. "retry" .-> O4
        end
        subgraph PRW["&thinsp; PRW Pipeline &thinsp;"]
            direction LR
            P1(["Receive"]):::rx --> P2(["Process"]):::proc --> P3(["Limit"]):::limit
            P3 --> P4(["Queue"]):::queue --> P5(["Prepare"]):::prep --> P6(["Send"]):::send
            P6 -. "retry" .-> P4
        end
    end

    subgraph Backends["&nbsp; Backends &nbsp;"]
        B1["Collector · Mimir\nVictoriaMetrics · Grafana Cloud"]:::backend
        B2["Prometheus · Thanos\nVictoriaMetrics · Cortex"]:::backend
    end

    S1 -->|"gRPC :4317\nHTTP :4318"| O1
    S2 -->|"HTTP :9091"| P1
    O6 --> B1
    P6 --> B2

    classDef source fill:#3498db,stroke:#1a5276,color:#fff,stroke-width:2px
    classDef rx fill:#1abc9c,stroke:#0e6655,color:#fff,stroke-width:2px
    classDef proc fill:#9b59b6,stroke:#6c3483,color:#fff,stroke-width:2px
    classDef limit fill:#e74c3c,stroke:#922b21,color:#fff,stroke-width:2px
    classDef queue fill:#f39c12,stroke:#b7770a,color:#fff,stroke-width:2px
    classDef prep fill:#3498db,stroke:#1a5276,color:#fff,stroke-width:2px
    classDef send fill:#2ecc71,stroke:#1a8c4e,color:#fff,stroke-width:2px
    classDef backend fill:#2ecc71,stroke:#1a8c4e,color:#fff,stroke-width:2px

    style Sources fill:#eaf2f8,stroke:#2980b9,stroke-width:2px,color:#1a5276
    style MG fill:#f9f3e3,stroke:#d4a017,stroke-width:3px,color:#7d6608
    style OTLP fill:#e8f6f3,stroke:#1abc9c,stroke-width:1px,color:#0e6655
    style PRW fill:#fef5e7,stroke:#f39c12,stroke-width:1px,color:#b7770a
    style Backends fill:#eafaf1,stroke:#2ecc71,stroke-width:2px,color:#1a8c4e

Each pipeline runs independently: Receive → Process → Limit → Queue → Prepare → Send → Backend. Failed exports retry through the queue with circuit breaker protection.

Features

Receive — Dual Native Protocols

Protocol	Ports	Capabilities
OTLP gRPC	`:4317`	Full `ExportMetricsService`, TLS/mTLS, bearer token, gzip/zstd, vtprotobuf zero-alloc unmarshal
OTLP HTTP	`:4318`	Protobuf + JSON, gzip/zstd/snappy decompression, content negotiation, vtprotobuf pool reuse
PRW 1.0/2.0	`:9091`	Auto-detect version, native histograms, VictoriaMetrics mode, exemplars

Backpressure built in: capacity-bounded buffers return 429 / ResourceExhausted when full. Docs

Supported backends:

Protocol	Backends
OTLP	OpenTelemetry Collector, Grafana Mimir, Cortex, VictoriaMetrics, ClickHouse, Grafana Cloud
PRW	Prometheus, VictoriaMetrics, Grafana Mimir, Cortex, Thanos Receive, Amazon Managed Prometheus, GCP Managed Prometheus, Grafana Cloud

Process — Unified Rules Engine

Six actions in a single ordered pipeline — first match wins:

Action	What It Does	Terminal?
Sample	Stochastic reduction (probabilistic or head-N)	Yes
Downsample	Per-series compression — 10 methods incl. adaptive CV-based, LTTB, SDT	Yes
Aggregate	Cross-series reduction with `group_by` — avg, sum, p95, stddev, and more	Yes
Transform	12 label operations — rename, regex replace, add, remove, keep, drop	No (chains)
Classify	Derive ownership labels (team, severity, priority) from metric metadata	No (chains)
Drop	Unconditional removal	Yes

Transform → Classify chaining: non-terminal actions chain — classify metrics into categories, then transform labels to match your storage schema in a single pass. Plus dead rule detection: always-on metrics track when rules stop matching, with optional scanner and alert rules for stale rule cleanup. Docs

Control — Intelligent Cardinality Governance

Adaptive Limiting — Drops only the top offenders, not everything. Per-group tracking by service, namespace, or any label combination. Tiered escalation: log → sample → strip labels → drop. Dry-run mode for safe rollouts
Cardinality Tracking — Three modes: Bloom filter (98% less memory — 1.2 MB vs 75 MB @ 1M series), HyperLogLog (constant 12 KB), Hybrid (auto-switches at threshold)
Bloom Persistence — Save/restore filter state across restarts, eliminating cold-start re-learning
Rule Ownership Labels — Attach team, slack_channel, pagerduty_service to any rule for Alertmanager routing
LLM/GenAI Token Budget Tracking — Monitor token consumption rates, budget burn, per-model/provider visibility. Govern gen_ai.* metrics with limits rules or dedicated tracker

Export — High-Throughput Pipeline

Optimization	Impact	How
vtprotobuf	Zero-allocation marshal/unmarshal	PlanetScale vtprotobuf with `sync.Pool` message reuse — near-zero GC pressure
Pipeline Split	+60-76% throughput	CPU-bound preparers (NumCPU) compress, I/O-bound senders (NumCPU x 2) send HTTP
AIMD Batch Tuning	Auto-discovers optimal batch size	+25% after 10 successes, -50% on failure, HTTP 413 ceiling discovery
Adaptive Worker Scaling	1 to NumCPU x 4 workers	EWMA latency tracking, scale up on queue depth, halve on 30s idle
Async Send	Max network utilization	Semaphore-bounded concurrency: 4/sender, NumCPU x 8 global
Connection Pre-warming	Zero cold-start latency	HEAD requests at startup establish connection pools
String Interning	76% fewer allocations	Label deduplication across the hot path
Compression Pooling	80% fewer allocs	Reusable gzip/zstd/snappy encoder pools

Protect — Zero Data Loss Architecture

Always-Queue — All data flows through the queue (VMAgent/OTel-inspired), eliminating flush-time blocking
Persistent Queue — FastQueue disk-backed with snappy compression, 256 KB buffered I/O, write coalescing — 128x fewer IOPS, 70% less disk I/O
Circuit Breaker — Three-state (closed/open/half-open) with CAS transitions, prevents cascading failures
Split-on-Error — Oversized batches auto-split on HTTP 413 from Mimir, Thanos, VictoriaMetrics, Cortex
Backpressure — Buffer returns 429/ResourceExhausted; percentage-based memory sizing (15% buffer, 15% queue)
Graceful Shutdown — Drains in-flight exports and persists queue state before termination

Scale — Horizontal and Hierarchical

Consistent Sharding — Hash ring with 150 virtual nodes per endpoint, K8s DNS discovery with automatic failover. Same series always routes to same backend (OTLP and PRW)
Two-Tier Architecture — DaemonSet edge (Tier 1) processes per-node, StatefulSet gateway (Tier 2) aggregates globally — 10-50x traffic reduction between nodes
Percentage-Based Memory — Buffer and queue sizes auto-scale with container resources via cgroup detection
Three Queue Modes — memory (fastest), disk (durable), hybrid (best of both)

Monitor — Full Observability

Real-Time Statistics — Per-metric cardinality, datapoints, and limit violations with three stats levels (none/basic/full)
13 Production Alerts — Zero-overlap design: DataLoss, ExportDegraded, QueueSaturated, CircuitOpen, OOMRisk, CardinalityExplosion, and more — each with runbooks
Dead Rule Detection — Always-on last-match tracking for processing and limits rules, with alert rules for stale rule cleanup
Grafana Dashboards — Operations and development dashboards included, auto-imported via provisioning
Health Endpoints — /live and /ready probes with per-component JSON status for Kubernetes

Deploy — Production Ready from Day One

Helm Chart — Full production chart with probes, ConfigMap sidecar, HPA-ready, alert rules integrated
Profiles — 6 presets (minimal, balanced, safety, observable, resilient, performance) — one flag to set 30+ parameters, tuned from measured vtprotobuf benchmarks
Hot Reload — SIGHUP reloads limits and processing rules without restart; ConfigMap sidecar for Kubernetes
Interactive Playground — Browser tool estimates resources, generates Helm/YAML/limits configs, recommends cloud storage classes
TLS/mTLS + Auth — Full TLS, mutual TLS, bearer token, basic auth, custom headers
Zero-Config Start — Works out of the box with sensible defaults; add limits and sharding when needed

Performance at a Glance

Measured comparison — governor vs OTel Collector vs vmagent (4-core, 1 GB, OTLP gRPC → HTTP):

Load	Tool	CPU avg	Memory avg	Ingestion
50k dps	metrics-governor (balanced)	4.51%	19.5%	99.25%
	OTel Collector	4.51%	15.3%	99.83%
	vmagent	2.94%	7.3%	99.90%
100k dps	metrics-governor (balanced)	6.47%	18.4%	99.53%
	OTel Collector	6.58%	9.3%	99.83%
	vmagent	16.70%	3.2%	99.83%

Governor scales sublinearly: 1.43x CPU for 2x load (50k→100k). At 100k dps, governor uses less CPU than OTel Collector while providing full governance features neither tool offers.

Optimization	Impact
vtprotobuf marshal/unmarshal	Zero allocations — `sync.Pool` message reuse, near-zero GC pressure
Pipeline split	+60-76% throughput — CPU-bound preparers + I/O-bound senders
Green Tea GC + GOGC=100	48% memory reduction vs default GC tuning
Cardinality memory (Bloom)	1.2 MB per 1M series (98% less than maps)
String interning	76% fewer allocations on the hot path
Disk I/O (buffered + coalesced)	128x fewer IOPS, 70% less throughput
Queue compression (snappy)	2.5-3x storage capacity
Two-tier traffic reduction	10-50x between DaemonSet and StatefulSet tiers

See Performance Guide and Benchmarks for methodology and full results.

Flexible Operating Modes

One binary, six profiles — choose durability, observability, cost efficiency, or raw throughput:

Priority	Queue Mode	Stats Level	Profile	Cost Efficiency	Trade-off
Maximum Safety	`disk`	`full`	`safety`	High	Full crash recovery + per-metric cost tracking
Durable + Observable	`hybrid`	`full`	`observable`	High	Disk spillover + full per-metric stats for cost visibility
Resilient	`hybrid`	`basic`	`resilient`	Medium	Memory-speed normally, disk spillover for spikes
High Throughput	`hybrid`	`basic`	`performance`	Low	Pipeline split + max throughput + adaptive tuning
Balanced (default)	`memory`	`basic`	`balanced`	Medium	Best performance with essential metrics
Minimal Footprint	`memory`	`none`	`minimal`	—	Smallest resource usage, pure proxy

Higher proxy resources (disk, CPU) can save 10–100x in backend SaaS costs by identifying and reducing expensive metrics before they reach your storage. See Cost Efficiency.

See Profiles and Performance Tuning for details.

Quick Start

# Start metrics-governor with adaptive limits
metrics-governor \
  -exporter-endpoint otel-collector:4317 \
  -limits-config limits.yaml \
  -limits-dry-run=false \
  -stats-labels service,env

# Point your apps at metrics-governor instead of the collector
# export OTEL_EXPORTER_OTLP_ENDPOINT=http://metrics-governor:4317

# limits.yaml — adaptive limiting by service
rules:
  - name: "per-service-limits"
    match:
      labels:
        service: "*"
    max_cardinality: 10000
    max_datapoints_rate: 100000
    action: adaptive
    group_by: ["service"]

When cardinality exceeds 10,000, metrics-governor identifies which service is the top contributor and drops only that service's excess metrics — everyone else keeps flowing.

Playground

Plan your deployment in seconds. The interactive Playground estimates CPU, memory, disk I/O, and K8s pod sizing from your throughput inputs, and generates ready-to-use Helm, app config, and limits YAML — all in a single zero-dependency HTML page.

Open Playground | Source

_{Throughput Inputs — Simple & Advanced modes}	_{Resource Estimation — CPU, memory, disk, fit check}
_{Editable YAML — Bidirectional sync with inputs}	_{Fit Check — Pod override & resource validation}

Documentation

	Guide	Description
🚀	Installation	Source, Docker, or Helm chart
⚙️	Configuration	YAML config and CLI flags reference
📋	Profiles	6 presets: `minimal`, `balanced`, `safety`, `observable`, `resilient`, `performance`
📡	Receiving	OTLP gRPC/HTTP, PRW 1.0/2.0, backpressure
📡	PRW Protocol	PRW 1.0/2.0, native histograms, VictoriaMetrics mode
🔄	Processing Rules	Sample, downsample, aggregate, transform, classify, drop, dead rule detection
🏗️	Two-Tier Architecture	DaemonSet edge + StatefulSet gateway pattern
🎯	Limits	Adaptive limiting, tiered escalation, per-label limits, rule ownership
👥	Multi-Tenancy	Tenant detection (header/label/attribute), per-tenant quotas, priority-based enforcement
🔀	Sharding	Consistent hashing, K8s DNS discovery
📊	Statistics	Per-metric tracking, three stats levels
⚡	Export Pipeline	Pipeline split, batch tuning, adaptive scaling
⚡	Performance	Bloom filters, string interning, I/O optimization
🛡️	Resilience	Circuit breaker, persistent queue, backoff
📦	Queue	Memory, disk, hybrid queue modes
🔢	Cardinality Tracking	Bloom, HyperLogLog, Hybrid mode
💾	Bloom Persistence	Save/restore filter state across restarts
🚨	Alerting	13 alerts with runbooks, dead rule detection
🎯	SLOs	SLI definitions, error budgets, burn-rate alerts, health dashboard
🤖	LLM Governance	Token budget tracking, `gen_ai.*` metric governance, example configs
📊	Dashboards	Grafana operations and development dashboards
🏭	Production Guide	Sizing, HPA/VPA, DaemonSet, bare metal
🔧	Stability Tuning	Graduated spillover, load shedding, drain ordering, backpressure tuning
🏥	Health	Kubernetes liveness and readiness probes
🔄	Dynamic Reload	Hot-reload via SIGHUP with ConfigMap sidecar
🔐	TLS	Server/client TLS, mTLS
🔑	Auth	Bearer token, basic auth, custom headers
📦	Compression	gzip, zstd, snappy
🌐	HTTP Settings	Connection pools, timeouts, HTTP/2
📝	Logging	JSON structured logging
🖥️	Playground	Interactive deployment planner
🧪	Testing	Test environment, Docker Compose
🛠️	Development	Building, contributing
📜	Changelog	Release history with breaking changes
⚠️	Deprecations	Deprecation lifecycle, migration table

Contributing

Contributions welcome! See Development Guide for details.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

Apache License 2.0 — see LICENSE.

Support

_{Built with ❤️ for the observability community}

Name		Name	Last commit message	Last commit date
Latest commit History 320 Commits
.github		.github
alerts		alerts
bin		bin
cmd/metrics-governor		cmd/metrics-governor
compose_overrides		compose_overrides
dashboards		dashboards
deploy		deploy
design		design
docs		docs
e2e		e2e
examples		examples
functional		functional
helm/metrics-governor		helm/metrics-governor
internal		internal
proto/opentelemetry/proto		proto/opentelemetry/proto
rust/compress-ffi		rust/compress-ffi
scripts		scripts
test		test
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.hadolint.yaml		.hadolint.yaml
.yamllint.yml		.yamllint.yml
CHANGELOG.md		CHANGELOG.md
DEPRECATIONS.md		DEPRECATIONS.md
Dockerfile		Dockerfile
Dockerfile.native		Dockerfile.native
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yaml		docker-compose.yaml
go.mod		go.mod
go.sum		go.sum
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metrics-governor

What's New

The Cardinality Problem — And Why It's Still Unsolved

Comparison: Open-Source Collectors & Agents

Comparison: Vendor Cardinality Management

Universal Governance for Mixed Environments

Why metrics-governor?

Architecture

Features

Receive — Dual Native Protocols

Process — Unified Rules Engine

Control — Intelligent Cardinality Governance

Export — High-Throughput Pipeline

Protect — Zero Data Loss Architecture

Scale — Horizontal and Hierarchical

Monitor — Full Observability

Deploy — Production Ready from Day One

Performance at a Glance

Flexible Operating Modes

Quick Start

Playground

Documentation

Contributing

License

Support

About

Uh oh!

Releases 98

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

metrics-governor

What's New

The Cardinality Problem — And Why It's Still Unsolved

Comparison: Open-Source Collectors & Agents

Comparison: Vendor Cardinality Management

Universal Governance for Mixed Environments

Why metrics-governor?

Architecture

Features

Receive — Dual Native Protocols

Process — Unified Rules Engine

Control — Intelligent Cardinality Governance

Export — High-Throughput Pipeline

Protect — Zero Data Loss Architecture

Scale — Horizontal and Hierarchical

Monitor — Full Observability

Deploy — Production Ready from Day One

Performance at a Glance

Flexible Operating Modes

Quick Start

Playground

Documentation

Contributing

License

Support

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 98

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages