feat(metrics): add bound instruments behind experimental feature flag#3421
feat(metrics): add bound instruments behind experimental feature flag#3421bryantbiggs wants to merge 11 commits intoopen-telemetry:mainfrom
Conversation
db9517e to
fb725ae
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3421 +/- ##
=======================================
+ Coverage 83.7% 84.0% +0.3%
=======================================
Files 126 126
Lines 25386 26592 +1206
=======================================
+ Hits 21255 22363 +1108
- Misses 4131 4229 +98 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f8a03dc to
08649e8
Compare
|
#3420 is merged. @bryantbiggs can you rebase? The spec is progressing in parallel (open-telemetry/opentelemetry-specification#5050), so this is perfect time to get this merged! |
264009b to
e8cac00
Compare
91cb61a to
b0f4777
Compare
|
Pushed a round of fixes addressing all five comments; reply inline on the Two extras worth flagging that aren't tied to a comment: |
| @@ -37,6 +43,51 @@ where | |||
| } | |||
There was a problem hiding this comment.
Suggestion: bind directly to overflow tracker, eliminate the Fallback path
The current Fallback variant delegates every bound.add() call through the full unbound measure() path (sort → hash → RwLock → HashMap lookup). This creates several problems:
-
No performance guarantee. A
bind()call made before cardinality overflow gets ~2ns direct path. The same call made after overflow gets ~53ns — the user has no way to know which they got. The whole point ofbind()is a consistent performance guarantee. -
Unpredictable attribution. The Fallback handle calls
measure(value, attrs)each time, which means the data could end up in the overflow bucket, or in a dedicated entry if space frees up after delta eviction — and can flip between the two across collection cycles depending on what other unbound callers are doing. That's not "bound" to anything. -
Code complexity. The
Fallbackvariant,BoundFallbackHandle, and thefallback: Arc<dyn Measure<T>>parameter threaded throughMeasure::bind()all exist solely for this path.
Proposed alternative: When ValueMap::bind() hits the cardinality limit, bind directly to the overflow TrackerEntry instead of returning None. This gives the handle the same ~2ns direct write path, consistently lands in the overflow bucket, and removes the Fallback variant entirely. The trade-off is that data stays attributed to overflow permanently (even if space frees up), but:
bind()is typically done at startup before cardinality fills up — hitting overflow at bind time is already a degenerate case- If a user wants to recover, they can drop and re-
bind(), which will get a dedicated tracker if space is available - Consistent, predictable behavior is better than accidental self-healing that depends on other callers' patterns
There was a problem hiding this comment.
Pushed two commits.
fix(metrics): bind() at cardinality limit binds directly to overflow(commit 1)
ValueMap::bind_attrs now looks up (or lazily creates) the overflow TrackerEntry at the limit and returns a direct handle. The Fallback variants in BoundSumInner / BoundHistogramInner are removed. Bound handles hold Arc<TrackerEntry> and write directly for their lifetime — same atomic-update hot path as a normal bind. Drop-and-rebind to recover after delta eviction frees space. Spec-aligned with #5050: overflow attribution preserved (MUST), per-call lookup bypassed (SHOULD).
Bench (M4 Max, 3-attr key set):
| Before | After | |
|---|---|---|
| Counter_Bound_Delta | 1.80 ns | 1.84 ns |
| Counter_Bound_AtOverflow_Delta | ~50 ns | 1.84 ns |
| Histogram_Bound_Delta | 6.50 ns | 6.69 ns |
| Histogram_Bound_AtOverflow_Delta | ~58 ns | 6.59 ns |
Resolves points 1 and 2. New tests bound_*_overflow_persists_across_eviction_cycles lock in the predictability invariant.
refactor(metrics): extend direct-bind to all aggregators(commit 2)
Removes BoundFallbackHandle and the fallback parameter on Measure::bind entirely. LastValue, PrecomputedSum, and ExpoHistogram now have direct-bind impls matching the Sum/Histogram pattern. PrecomputedSum::bind is unreachable from user code today (async-only) but kept so the trait is uniform and future Gauge / Observable bind() extensions are mechanical. The rare RwLock-poisoned case yields a NoopBoundMeasure (drops silently, same as measure()'s own poison handling).
Tests added: bound ExponentialHistogram via View (delta, cumulative, NaN/inf filter, view filter at bind time, overflow attribution, persistence across delta eviction); cumulative bind-at-overflow for Counter and Histogram; LastValue / PrecomputedSum unit tests through the Measure trait; lock-poison test on ValueMap::bind.
The second commit is the full point-3 cleanup. Happy to keep it here or split into a follow-up if you'd rather land the correctness fix first
Add BoundCounter and BoundHistogram types that cache resolved aggregator references for a fixed attribute set. Created via Counter::bind() and Histogram::bind(), bound instruments bypass per-call attribute lookup for significant performance improvements (~28x for counters, ~9x for histograms). Architecture: - TrackerEntry.bound_count tracks live handles, preventing eviction - Direct/Fallback enum handles cardinality overflow gracefully - Unsupported aggregators (ExpoHistogram, LastValue, PrecomputedSum) fall back to unbound path instead of panicking All public API, traits, and internal plumbing are gated behind the experimental_metrics_bound_instruments feature flag. Includes 15 tests covering cumulative/delta temporality, overflow fallback, recovery after eviction, bound+unbound sharing, idle cycles, drop semantics, and empty attributes. Splits from open-telemetry#3392. Refs open-telemetry#1374.
cfg-gated imports must sort before non-gated imports of the same module per rustfmt's stable import ordering rules.
Add bound_count field to TrackerEntry to track live bound instrument handles. Entries with bound_count > 0 are never evicted during delta collection, ensuring bound handles always point to a live tracker. Moved from open-telemetry#3420 per review feedback — bound_count belongs with the bound instruments feature, not the delta collection refactor.
- Make no_attribute_tracker an Arc<TrackerEntry> so bind(&[]) returns a handle to the same tracker that measure(&[]) uses, preventing duplicate data points for empty attribute sets - Re-check bound_count under write lock in collect_and_reset() to prevent a TOCTOU race where bind() increments bound_count after collect() reads it as 0 under a concurrent read lock - Use Ordering::Release for has_been_updated stores in bound handles, matching the unbound measure() path
CI exercises bound instruments via --all-features, so the testing feature does not need to enable experimental_metrics_bound_instruments or pull in the tokio runtime additions.
Foundational OpenTelemetry guarantee is that telemetry must never crash the host application. A custom SyncInstrument impl that does not override bind() now returns a NoopBoundSyncInstrument and emits an otel_debug log, matching the silent-drop behavior of other default impls.
When bind() creates a new tracker, insert it under both the original and sorted attribute orderings, mirroring measure()'s insert pattern. Without this, an unbound measure() call with the same attrs in non-sorted order would miss the fast path and re-sort on every call when the bound side created the tracker first.
Switch from Box<dyn ...> to Arc<dyn ...> and derive Clone so a single bound state can be shared across threads or modules without re-binding. Drop semantics are unchanged: bound_count is decremented when the last clone is dropped, matching Rust's RAII expectations for handle types. Mirrors the existing Clone behavior on Counter and Histogram.
Tests:
* bound_histogram_empty_attributes_shares_with_unbound mirrors the
existing counter version for histograms.
* bound_counter_view_filters_attributes_at_bind_time and a histogram
analog confirm a view filters attributes at bind() time, with bound
and unbound recordings collapsing onto the same filtered data point.
Bench:
* Counter_Bound_With_View_Delta confirms view filtering happens at bind
time, leaving the hot path at unfiltered-bound speed. Updated bench
required-features and added a note that criterion does not enforce
regression on its own.
CHANGELOG entries for both crates updated to mention Clone, the noop
trait default, and the converged bound/unbound semantics.
a6ff02d to
de9da26
Compare
The previous Fallback path re-routed every bound.add() through the unbound measure() path, creating a ~25x perf cliff at the cardinality limit (~50ns vs ~1.8ns) and letting attribution drift across delta eviction cycles as space freed and refilled. ValueMap::bind() now looks up or lazily creates the overflow TrackerEntry at the limit and returns a direct handle to it. The bound handle writes directly to overflow via a single atomic update for its lifetime — perf parity with a normal bind. To recover after delta eviction frees space, drop the handle and rebind. Spec-aligned with open-telemetry/opentelemetry-specification#5050: overflow data lands in the otel.metric.overflow=true bucket (MUST) and per-call attribute lookup is bypassed (SHOULD). Bench (Apple M4 Max): Counter_Bound_AtOverflow_Delta: 1.82 ns Histogram_Bound_AtOverflow_Delta: 6.58 ns
Removes BoundFallbackHandle and the fallback parameter on Measure::bind, leaving every aggregator on the same direct-bind shape: bind() returns a BoundXxxHandle holding an Arc<TrackerEntry> that writes directly to the aggregator without per-call attribute lookup. LastValue, PrecomputedSum, and ExpoHistogram now match the Sum / Histogram pattern. PrecomputedSum's bind() is unreachable from user code today (async-only) but the impl exists so the trait is uniform and future Gauge / Observable bind() extensions are mechanical. The rare RwLock-poisoned case now yields a NoopBoundMeasure that drops measurements silently — mirroring measure()'s own poison handling. Test coverage added: - Bound ExponentialHistogram via View: delta with NaN/inf filter, bind-at-overflow attribution, persistence across delta eviction, view filter applied at bind time, and cumulative accumulation. - Cumulative bind-at-overflow tests for Counter and Histogram. - LastValue::bind and PrecomputedSum::bind unit tests exercising Measure / BoundMeasure traits and bound_count Drop semantics. - ValueMap::bind under a poisoned trackers RwLock returns None so the caller produces a NoopBoundMeasure rather than panicking.
de9da26 to
1f02cd8
Compare
Splits from #3392, per maintainer feedback. Built on top of #3420 (in-place delta collection, now merged). Refs #1374.
Summary
Adds
BoundCounterandBoundHistogramto the public API behind theexperimental_metrics_bound_instrumentsfeature flag. These types cache the resolved aggregator reference for a fixed attribute set, allowing subsequent measurements to bypass per-call sort, dedup, hash, and HashMap lookup entirely.Architecture
TrackerEntry bound_count
bound_count(introduced in #3420) tracks how many liveBoundCounter/BoundHistogramhandles reference an entry. Entries withbound_count > 0are never evicted during delta collection.Cardinality overflow handling
ValueMap::bind()returnsOption>—Nonewhen at the cardinality limit. Bound handle types use aDirect/Fallbackenum:Measure::call()pathFeature flag
All public API, traits, noop impls, and internal plumbing are gated behind
experimental_metrics_bound_instruments.Benchmark Results
Apple M4 Max, 16 cores (12 performance + 4 efficiency), macOS 15.4:
EC2 Counter Results (from #3392)
EC2 Histogram Results (from #3392)
Test Coverage
15 tests covering:
Test plan