Status: implemented
This document describes the current shipped metrics system.
Prisma Streams now treats metrics as a profile on top of the durable stream engine, not as a second datastore.
That means:
- stream segments remain the durable source of truth
- metrics-specific serving artifacts are immutable companion objects in object storage
- query-time planning chooses the cheapest available path while preserving correctness
- local SQLite stores only bounded metadata and family progress, not an unbounded metrics head
The comparison that motivated this direction is captured in alternative-metrics-approach.md.
The shipped metrics system has three layers:
- canonical metric interval records in ordinary stream segments
.aggrollup companions for aligned time windows.mblkper-segment metrics-block companions for non-aligned or non-rollup-eligible aggregate queries
The query planner for metrics streams is:
- use
.aggwhen the query aligns with a configured rollup window and its filters are rollup-eligible - otherwise use
.mblkwhen coverage is present - otherwise fall back to raw segment scan and WAL-tail scan for correctness
This borrows the important part of the MetricsDB philosophy:
- object-store-native immutable serving artifacts
- no large local metrics index tables
- query-time fanout/compute instead of a second always-resident metrics store
It does not introduce a separate primary metrics engine beside streams.
The server now ships a built-in metrics stream profile.
It means:
- stream content type must be
application/json - JSON appends are normalized into a canonical metrics interval envelope
- the profile auto-installs a canonical schema registry
- the profile auto-installs default
searchfields andsearch.rollups - the profile enables the
.mblkmetrics-block family in addition to the generic search families
The internal __stream_metrics__ system stream is automatically created with
this profile at startup.
Metrics do not bypass the core durable stream model.
The durable source of truth is still:
- SQLite WAL for acknowledged but not yet published data
- sealed stream segments for published data
- manifests and companion objects in object storage for published search state
Published metrics search and aggregation state is recovered through the same bootstrap model as the rest of the search system:
- schema registry object
- manifest
search_families - segment objects
.aggcompanion objects.mblkcompanion objects
The shipped metrics profile currently stores interval summary records.
This is the canonical envelope written to __stream_metrics__ and accepted by
user streams that install the metrics profile:
{
"apiVersion": "durable.streams/metrics/v1",
"kind": "interval",
"metric": "tieredstore.append.bytes",
"unit": "bytes",
"metricKind": "summary",
"temporality": "delta",
"windowStart": 1761396000000,
"windowEnd": 1761396010000,
"intervalMs": 10000,
"instance": "12345-abcd12",
"stream": "orders",
"tags": { "env": "prod" },
"attributes": { "env": "prod" },
"dimensionPairs": ["env=prod"],
"dimensionKey": "env=prod",
"seriesKey": "summary|delta|tieredstore.append.bytes|bytes|orders|12345-abcd12|env=prod",
"count": 4,
"sum": 2048,
"min": 128,
"max": 1024,
"avg": 512,
"p50": 512,
"p95": 1024,
"p99": 1024,
"buckets": { "128": 1, "512": 2, "1024": 1 },
"summary": {
"count": 4,
"sum": 2048,
"min": 128,
"max": 1024,
"histogram": { "128": 1, "512": 2, "1024": 1 }
}
}Notes:
tagsandattributescurrently carry the same normalized dimension mapdimensionPairsanddimensionKeyare the flattened query/index shapeseriesKeyis the canonical routing key- the record is still an ordinary JSON stream entry, not a separate metrics row store
The default metrics schema installs:
- exact keyword fields such as
metric,unit,stream,instance,metricKind,temporality,seriesKey,dimensionKey, anddimensionPairs - typed column fields such as
windowStart,windowEnd,intervalMs,count,sum,min,max,avg,p95, andp99 - a default
metricsrollup overwindowStart
That means metrics streams can use:
GET /v1/stream/{name}?filter=...POST /v1/stream/{name}/_searchPOST /v1/stream/{name}/_aggregate
Typical aggregate query:
{
"rollup": "metrics",
"from": "2026-03-25T10:00:00.000Z",
"to": "2026-03-25T11:00:00.000Z",
"interval": "1m",
"q": "metric:tieredstore.append.bytes stream:orders",
"group_by": ["metric", "stream"]
}.agg remains the fast path for aligned windows.
Use it when:
- the time range lines up with a configured rollup interval
- the query only constrains rollup dimensions with exact filters
- published
.aggcoverage is available
This is ideal for charts, KPI tiles, and repeated dashboard queries.
.mblk is the metrics-specific fallback accelerator.
Current properties:
- immutable
mblksections inside bundled per-segment.cixcompanions - binary
mblk2section payloads loaded on demand from the bundled container - the metrics-record payload is zstd-compressed when that reduces bytes, then inflated lazily on first fallback scan
- bundled companions are stored in object storage under
streams/<hash>/segments/...cix - local SQLite stores only bundled companion plan state and object keys
mblksections carry canonical metric interval summaries plus time-range metadata
Use it when:
- the query is not rollup-eligible
- the requested time range does not line up perfectly with a rollup window
- you still want to avoid decoding full JSON segments whenever possible
.mblk fits neatly beside .agg:
.aggis for aligned precomputed windows.mblkis for ad hoc aggregate serving over canonical metrics records
The current design improves cardinality handling in two important ways:
- query serving state is remote and immutable
- non-rollup aggregate queries no longer need large local SQLite projections
This means high-cardinality metrics primarily show up as:
- more bytes appended to the metrics stream
- more
.mblkand.aggcompanion bytes in object storage - more query-time scan/fanout work
They do not require a separate resident TSDB head or large local mutable series tables.
The internal emitter still maintains an in-memory per-series map for the flush interval in src/metrics.ts.
So the shipped system improves storage and query-path cardinality behavior more than ingest-path cardinality behavior.
That is deliberate for now. It keeps one durable stream model while avoiding a much larger rewrite of the internal metrics producer.
The server emits operational interval summaries to __stream_metrics__.
Current behavior:
- flush interval:
DS_METRICS_FLUSH_MS(default10000;0disables) - destination:
__stream_metrics__ - content type:
application/json - profile:
metrics - routing key: canonical
seriesKey - installed registry: canonical schema only
- installed schema routing key: none
- installed search config: none
- background routing / lexicon / exact / bundled companion indexing: disabled
This is intentional. The internal metrics stream must not create its own heavy
search and aggregate backfill loop while the node is already under load.
Searchable and aggregatable metrics remain supported on normal user-created
metrics streams; the lean internal stream exists only for durable operational
event capture.
To correlate those interval records with the node's current configuration, use
GET /v1/server/_details.
This implementation emits interval summaries for:
tieredstore.ingest.flush.latencytieredstore.ingest.sqlite_busy.waittieredstore.ingest.queue.bytestieredstore.ingest.queue.requeststieredstore.ingest.queue.capacity.bytestieredstore.ingest.queue.capacity.requeststieredstore.backpressure.over_limittieredstore.backpressure.current.bytestieredstore.backpressure.limit.bytestieredstore.backpressure.pressure
process.rss.bytesprocess.rss.over_limitprocess.rss.current.bytesprocess.rss.max_interval.bytesprocess.heap.total.bytesprocess.heap.used.bytesprocess.external.bytesprocess.array_buffers.bytesprocess.memory.rss.anon.bytesprocess.memory.rss.file.bytesprocess.memory.rss.shmem.bytesprocess.memory.js_managed.bytesprocess.memory.js_external_non_array_buffers.bytesprocess.memory.unattributed.bytesprocess.memory.unattributed_anon.bytesprocess.memory.limit.bytesprocess.memory.pressureprocess.gc.forced.countprocess.gc.reclaimed.bytes- tags:
kind=last|total
- tags:
process.gc.last_forced_at_msprocess.heap.snapshot.countprocess.heap.snapshot.last_at_msprocess.memory.high_water.bytes- tags:
metric=<name>
- tags:
tieredstore.sqlite.memory.used.bytestieredstore.sqlite.memory.high_water.bytestieredstore.sqlite.pagecache.usedtieredstore.sqlite.pagecache.high_watertieredstore.sqlite.pagecache.overflow.bytestieredstore.sqlite.pagecache.overflow.high_water.bytestieredstore.sqlite.malloc.counttieredstore.sqlite.malloc.high_water.counttieredstore.sqlite.open_connectionstieredstore.sqlite.prepared_statementstieredstore.sqlite.high_water- tags:
metric=<name>
- tags:
tieredstore.memory.subsystem.bytes- tags:
kind=heap_estimates|mapped_files|disk_caches|configured_budgets|pipeline_buffers|sqlite_runtimesubsystem=<name>
- tags:
tieredstore.memory.subsystem.count- tags:
subsystem=<name>
- tags:
tieredstore.memory.tracked.bytes- tags:
kind=heap_estimate|mapped_file|disk_cache|configured_budget|pipeline_buffer|sqlite_runtime
- tags:
tieredstore.memory.high_water.bytes- tags:
kind=runtime_total|runtime_subsystemmetric=<name>subsystem_kind=<name>forkind=runtime_subsystem
- tags:
tieredstore.concurrency.limit- tags:
gate=ingest|read|search|async_indexkind=configured|effective
- tags:
tieredstore.concurrency.active- tags:
gate=ingest|read|search|async_index
- tags:
tieredstore.concurrency.queued- tags:
gate=ingest|read|search|async_index
- tags:
tieredstore.upload.pending_segmentstieredstore.upload.concurrency.limittieredstore.auto_tune.preset_mbtieredstore.auto_tune.effective_memory_limit_mb
tieredstore.objectstore.put.latencytieredstore.objectstore.get.latencytieredstore.objectstore.head.latencytieredstore.objectstore.delete.latencytieredstore.objectstore.list.latency- tags:
artifact=manifest|schema_registry|routing_index|routing_key_lexicon|exact_index|segment|bundled_companion|stream_catalog|meta|unknownoutcome=ok|miss|error
- tags:
tieredstore.append.bytestieredstore.append.entriestieredstore.read.bytestieredstore.read.entries
tieredstore.index.lag.segmentstieredstore.index.build.queue_lentieredstore.index.builds_inflighttieredstore.index.build.latencytieredstore.index.runs.builttieredstore.index.compact.latencytieredstore.index.runs.compactedtieredstore.index.bytes.readtieredstore.index.bytes.writtentieredstore.index.active_runs
tieredstore.index.run_cache.used_bytestieredstore.index.run_cache.entriestieredstore.index.run_cache.hitstieredstore.index.run_cache.missestieredstore.index.run_cache.evictionstieredstore.index.run_cache.bytes_added
Use these endpoints to inspect metrics stream state:
GET /metricslightweight process snapshot including current in-memory series countGET /v1/server/_detailsconfigured cache / concurrency limits plus live gate, queue, upload, and detailed runtime memory stateGET /v1/server/_memcompact process-memory triage view including process breakdown, SQLite runtime counters, GC/high-water state, and bounded top-stream contributorsGET /v1/stream/__stream_metrics__/_profilecurrent profile resourceGET /v1/stream/__stream_metrics__/_schemacanonical metrics schema for the internal stream; this intentionally has noroutingKeyand nosearchsectionGET /v1/stream/__stream_metrics__/_index_statuscurrent index state; the internal stream should report no configured routing, lexicon, exact, or bundled companion familiesGET /v1/stream/__stream_metrics__/_detailscombined stream/profile/schema/index view
Not implemented today:
- a second dedicated metrics datastore beside streams
- a metrics-specific query language
- first-class OpenTelemetry metric-point ingest
- metrics-native exemplars or baggage/context semantics
- histogram/exponential-histogram canonical source records
- elimination of the in-memory flush-interval series map
The next natural expansion points are:
- broaden the
metricsprofile from interval summaries to first-class canonical metric points - add profile-owned handling for
Sum,Gauge,Histogram, andExponentialHistogram - use
.mblkmore aggressively for series discovery and non-rollup aggregate planning - reduce ingest-path active-series memory pressure in the internal emitter
The runtime memory view is intentionally split into:
process.*- direct process totals from
process.memoryUsage()
- direct process totals from
runtime.memory.subsystems.heap_estimates- bytes the server can currently attribute to in-process retained structures such as the ingest queue and in-memory index-run caches
- index-run cache bytes are tracked against encoded run-object size so active routing runs can remain hot without the estimate expanding to full JS object overhead
runtime.memory.subsystems.mapped_files- mmap-backed file bytes for cached segments,
.lex, and bundled.cixcaches
- mmap-backed file bytes for cached segments,
runtime.memory.subsystems.disk_caches- on-disk cache occupancy for segment, run, lexicon, and companion caches
runtime.memory.subsystems.configured_budgets- the configured caps those caches are meant to respect
Use these together when diagnosing high RSS:
- if
process.heap.used.bytesis high andheap_estimatesgrows with it, the likely issue is retained JS-side state - if
mapped_filesis large butheap.usedstays modest, the process is likely file-backed rather than heap-heavy - if RSS is high while both
heap_estimatesandmapped_filesstay small, the remaining pressure is likely SQLite, Bun runtime, or other unattributed native allocations