Bound instruments#5050
Conversation
dashpole
left a comment
There was a problem hiding this comment.
Overall, this matches what I had in mind
|
Here was my original go prototype: open-telemetry/opentelemetry-go#7790 It was incomplete in a few ways, but the raw performance numbers are accurate. The baseline has improved due to other optimizations, so the comparison to baseline isn't accurate anymore. I also wouldn't count it as having "support of that SIG's maintainers" as well. |
…y-specification into bound-instruments
cijothomas
left a comment
There was a problem hiding this comment.
LGTM overall — clean, well-scoped addition.
A few minor comments inline. None are blockers for merging — happy to see this move forward and iterate on the details (lifecycle/unbind, delta temporality nuances) as the prototypes mature.
|
#618 Found this old discussion about UnBind(), - there was support for it. Jack suggested "finish()" might be a good alternative as well. |
Naming is hard, Unbind/Close/Finish/Unregister/etc. As stated there, we need to define what it means to use an "unregistered" instrument (bound). |
https://github.com/open-telemetry/opentelemetry-specification/pull/4702/changes This is the "finish" PR I was referring to. I think Jack meant this when he said "Finish()". |
|
@cijothomas Agreed, only proposed alternative names. |
dashpole
left a comment
There was a problem hiding this comment.
I would also prefer to return the same instrument interface to keep this simpler for users. But I'm OK with this as experimental to get feedback.
|
Thanks for the reviews / perspective @cijothomas, @bogdandrutu, @dashpole, @reyang! I think we have some learning to do around ergonomics and appropriate use which will play out best by shipping and gathering feedback on prototype implementations. I've seen a couple of similar perspectives, which I also share:
Let's work through these conversations (and any others the pop up), and try to shoot for landing something that allows us to ship prototypes which are directionally aligned, but with enough leeway that we can experiment with things like naming, ergonomics, and anything else we might want to solicit feedback on before stabilizing sometime in the future. |
|
#5050 (comment) |
MrAlias
left a comment
There was a problem hiding this comment.
A few spec-level concerns remain before this reads as standardized behavior across SDKs.
The main gap is that the current wording still leaves too much room for materially different implementations around bind-time view processing, cardinality/eviction guarantees, and the API contract of the returned bound handle. The prototypes are useful here because they show those choices are not hypothetical: Java, Rust, and Go are already exploring different shapes and lifetime behaviors.
I left the concrete points inline.
The previous Fallback path re-routed every bound.add() through the unbound measure() path, creating a ~25x perf cliff at the cardinality limit (~50ns vs ~1.8ns) and letting attribution drift across delta eviction cycles as space freed and refilled. ValueMap::bind() now looks up or lazily creates the overflow TrackerEntry at the limit and returns a direct handle to it. The bound handle writes directly to overflow via a single atomic update for its lifetime — perf parity with a normal bind. To recover after delta eviction frees space, drop the handle and rebind. Spec-aligned with open-telemetry/opentelemetry-specification#5050: overflow data lands in the otel.metric.overflow=true bucket (MUST) and per-call attribute lookup is bypassed (SHOULD). Bench (Apple M4 Max): Counter_Bound_AtOverflow_Delta: 1.82 ns Histogram_Bound_AtOverflow_Delta: 6.58 ns
The previous Fallback path re-routed every bound.add() through the unbound measure() path, creating a ~25x perf cliff at the cardinality limit (~50ns vs ~1.8ns) and letting attribution drift across delta eviction cycles as space freed and refilled. ValueMap::bind() now looks up or lazily creates the overflow TrackerEntry at the limit and returns a direct handle to it. The bound handle writes directly to overflow via a single atomic update for its lifetime — perf parity with a normal bind. To recover after delta eviction frees space, drop the handle and rebind. Spec-aligned with open-telemetry/opentelemetry-specification#5050: overflow data lands in the otel.metric.overflow=true bucket (MUST) and per-call attribute lookup is bypassed (SHOULD). Bench (Apple M4 Max): Counter_Bound_AtOverflow_Delta: 1.82 ns Histogram_Bound_AtOverflow_Delta: 6.58 ns
|
Pushed 12b46b8 to address a variety of the conversations here:
I marked a variety of conversations as resolved. Please re-open them if you don't think they're addressed or you disagree with the resolution. |
cijothomas
left a comment
There was a problem hiding this comment.
Re-approving with new changes. Looks solid, excited to see prototypes landing soon!
| recording operations on the returned bound instrument negates the performance benefits | ||
| of binding. | ||
|
|
||
| Measurements recorded on the bound instrument can be associated with the |
There was a problem hiding this comment.
Could this clarify how Context is associated with bound measurements?
The SDK section says a bound instrument should behave the same as calling the unbound instrument with the pre bound attributes, including exemplar behavior. Since exemplars can depend on Context, it would help to spell out the intended behavior here.
For example, should bound.add(value) behave like counter.add(value, boundAttributes) using the normal/current Context behavior? And if a language exposes an explicit Context overload for unbound recordings, should it also expose an equivalent bound recording operation?
This would make it easier for SDKs to keep bound and unbound behavior consistent.
| [Attribute processing](#measurement-processing) and [cardinality limit](#cardinality-limits) | ||
| evaluation MUST be performed at bind time. The resolved aggregator is fixed for the | ||
| lifetime of the bound instrument and does not change across collection cycles. | ||
|
|
There was a problem hiding this comment.
This model makes sense to me. One small clarification that might help implementers: once a bound instrument resolves to an aggregator at bind time, it should keep writing to that same aggregator even if later measurements with other attribute sets hit the cardinality limit or overflow bucket.
In other words, existing bound handles are not re evaluated or rerouted later; only new binds/new attribute sets would be affected by the current cardinality state.
Resolves #4126.
See issue for full details, but some of the key bits:
There have been a few prototypes built:
I'm particularly interested to here from @dashpole and @cijothomas if this aligns with their prototypes and ideas.
Other interest has been expressed in .NET and Erlang, but no prototypes that I'm aware of.