Checks
Strands Version
v1.33
Python Version
3.12
Operating System
macox
Installation Method
pip
Steps to Reproduce
-
Install Strands Python SDK via pip
-
Configure OTEL env vars to export traces to Langfuse:
OTEL_EXPORTER_OTLP_ENDPOINT
OTEL_EXPORTER_OTLP_HEADERS
OTEL_EXPORTER_OTLP_PROTOCOL
DISABLE_ADOT_OBSERVABILITY=true
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_tool_definitions
-
Run an agent across multiple invocations in a multi-chat session
-
Observe token usage reported in Langfuse per span
Expected Behavior
Each OTEL span should report token usage for that specific invocation only. e.g., in a session with 10 invocations of ~100k tokens each:
- Request 1 → 100k tokens
- Request 2 → 100k tokens
- Request 3 → 100k tokens
Total: ~1M tokens
Actual Behavior
Each OTEL span reports the session-lifetime accumulated_usage instead of per-invocation usage. e.g.:
- Request 1 → 100k tokens ✅
- Request 2 → 200k tokens ❌ (should be 100k)
- Request 3 → 300k tokens ❌ (should be 100k)
Langfuse then sums these, resulting in wildly inflated token counts and cost estimates. Additionally, reset_usage_metrics() does NOT reset accumulated_usage — this appears to be intentional per the test suite, but means there is no workaround.
Additional Context
Root cause identified in src/strands/telemetry/tracer.py: The OTEL span reporter explicitly uses accumulated_usage when setting span attributes:
accumulated_usage = response.metrics.accumulated_usage
attributes.update({
"gen_ai.usage.prompt_tokens": accumulated_usage["inputTokens"],
"gen_ai.usage.input_tokens": accumulated_usage["inputTokens"],
"gen_ai.usage.output_tokens": accumulated_usage["outputTokens"],
...
})
In metrics.py, reset_usage_metrics() is intentionally designed to NOT clear accumulated_usage:
# Verify accumulated_usage is NOT cleared
assert event_loop_metrics.accumulated_usage["inputTokens"] == 11
Possible Solution
In tracer.py, replace accumulated_usage with the latest invocation's usage:
agent_invocations[-1].usage
instead of:
response.metrics.accumulated_usage
This would report only the current invocation's token usage on each span.
Related Issues
No response
Checks
Strands Version
v1.33
Python Version
3.12
Operating System
macox
Installation Method
pip
Steps to Reproduce
Install Strands Python SDK via pip
Configure OTEL env vars to export traces to Langfuse:
OTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_HEADERSOTEL_EXPORTER_OTLP_PROTOCOLDISABLE_ADOT_OBSERVABILITY=trueOTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_tool_definitionsRun an agent across multiple invocations in a multi-chat session
Observe token usage reported in Langfuse per span
Expected Behavior
Each OTEL span should report token usage for that specific invocation only. e.g., in a session with 10 invocations of ~100k tokens each:
Total: ~1M tokens
Actual Behavior
Each OTEL span reports the session-lifetime accumulated_usage instead of per-invocation usage. e.g.:
Langfuse then sums these, resulting in wildly inflated token counts and cost estimates. Additionally,
reset_usage_metrics()does NOT resetaccumulated_usage— this appears to be intentional per the test suite, but means there is no workaround.Additional Context
Root cause identified in
src/strands/telemetry/tracer.py: The OTEL span reporter explicitly usesaccumulated_usagewhen setting span attributes:In
metrics.py,reset_usage_metrics()is intentionally designed to NOT clearaccumulated_usage:Possible Solution
In
tracer.py, replaceaccumulated_usagewith the latest invocation's usage:instead of:
This would report only the current invocation's token usage on each span.
Related Issues
No response