feat: add apollo.router.operations.rhai.duration histogram metric (copy #9072)#9168
feat: add apollo.router.operations.rhai.duration histogram metric (copy #9072)#9168mergify[bot] wants to merge 4 commits intodevfrom
Conversation
Emits a new `apollo.router.operations.rhai.duration` (f64 histogram, unit: s) for every Rhai script callback execution across all pipeline stages. Attributes: - `rhai.stage`: RouterRequest, RouterResponse, SupergraphRequest, SupergraphResponse, ExecutionRequest, ExecutionResponse, SubgraphRequest, SubgraphResponse - `rhai.succeeded`: bool — false if the script threw an EvalAltResult - `rhai.is_deferred`: bool — true for @defer stream-chunk callbacks (Supergraph/Execution response stages only; Router-stage deferred path activates automatically when #3642 is resolved) A dedicated RhaiStage enum is defined in rhai/mod.rs rather than reusing PipelineStep from the coprocessor protocol, avoiding semantic coupling to coprocessor-only variants (ConnectorRequest/Response). Uses f64_histogram_with_unit! per dev-docs/metrics.md guidance (coprocessor uses the deprecated f64_histogram! — migration is a separate task). The rhai.succeeded attribute follows the coprocessor convention rather than OTel error.type, intentionally, for symmetry. No separate operation count metric is added; the histogram datapoints are directly countable per attribute combination. Also documents the new metric in standard-instruments.mdx. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 12ed389)
|
@mergify[bot]: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/ |
✅ Docs preview readyThe preview is ready to be viewed. View the preview File Changes 0 new, 1 changed, 0 removedBuild ID: 36a23a14747cf435eebcb9a9 URL: https://www.apollographql.com/docs/deploy-preview/36a23a14747cf435eebcb9a9
|
So we don't have to do it at every callsite. Also, I removed the `rhai.is_deferred` property from the request side, where it's always `false` and doesn't really make sense. There is no such thing as deferred data on the request side.
goto-bus-stop
left a comment
There was a problem hiding this comment.
I might follow up with some cleanup of the macros here. They don't look necessary and they're confusing to a girl like me
| - `apollo.router.operations.rhai.duration` - Time spent executing a Rhai script callback, in seconds | ||
| - `rhai.stage`: string (`RouterRequest`, `RouterResponse`, `SupergraphRequest`, `SupergraphResponse`, `ExecutionRequest`, `ExecutionResponse`, `SubgraphRequest`, `SubgraphResponse`) | ||
| - `rhai.succeeded`: bool | ||
| - `rhai.is_deferred`: bool — present on response stages. `true` for `@defer` and subscription data chunks, `false` for the primary or initial response. |
There was a problem hiding this comment.
This should be flagged as null for a response which was not part of a deferred query
| let result = execute( | ||
| &$rhai_service, | ||
| $stage, | ||
| Some(ResponseChunk::Primary), |
There was a problem hiding this comment.
Since this is for the deferred response, shouldn't this be ResponseChunk::Stream?
There was a problem hiding this comment.
No, the macro name means the response can have deferred chunks, but it doesn't have to. An @defer response is made up of a primary response + streaming deferred chunks.
| duration, | ||
| "rhai.stage" = stage, | ||
| "rhai.succeeded" = succeeded, | ||
| "rhai.is_deferred" = is_deferred |
There was a problem hiding this comment.
I'm curious what the utility of the rhai.is_deferred attribute is; I wouldn't expect a significant difference and am not sure what you'd do if it were different?
Either way, if you start seeing an increase in duration, I'd think you'd have to dig through spans to find the outliers.
There was a problem hiding this comment.
Fair question, and we don't have it for the coprocessor metrics. @theJC do you miss the absence of .is_deferred on coprocessors?
There was a problem hiding this comment.
Not yet 😄 - only a couple of our clients have started to use @defer and we have not had cases where we needed to diagnose and distinguish and have metrics specifically surrounding those yet.
It wouldn't break my heart y'all wanted to simplify and eject that complexity/concern at this point.
Closes #9072
Summary
Design notes
No separate operation count metric: `dev-docs/metrics.md` calls for an `apollo.router.operations.rhai` counter alongside the histogram, following the coprocessor pattern. This PR intentionally omits it — the histogram datapoints are directly countable per attribute combination (`rhai.stage`, `rhai.succeeded`), so a parallel counter would be redundant. Consumers can derive counts from the histogram without the cardinality overhead of a separate instrument.
`rhai.succeeded` attribute: Uses a positive boolean rather than OTel's `error.type` convention, intentionally mirroring the `coprocessor.succeeded` attribute on `apollo.router.operations.coprocessor` for symmetry.
`f64_histogram_with_unit!` vs coprocessor: The coprocessor duration metric uses the deprecated `f64_histogram!` (no unit). This PR uses `f64_histogram_with_unit!` per the guidance in `dev-docs/metrics.md`. Migrating the coprocessor metric is a separate task (would be a breaking rename in Prometheus output).
Test plan
🤖 Generated with Claude Code
This is an automatic copy of pull request #9072 done by Mergify.