-
Notifications
You must be signed in to change notification settings - Fork 198
fix: otel warning during tail sampling buffer replay #1632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cyberksh
wants to merge
3
commits into
pydantic:main
Choose a base branch
from
cyberksh:tail-sampling
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+56
−2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When TailSamplingProcessor.push_buffer() replays buffered spans, it was calling on_start for spans that had already ended. This caused wrapped processors like DirectBaggageAttributesSpanProcessor to call span.set_attributes() on ended spans, triggering OpenTelemetry warnings. This fix tracks which spans have already ended and skips the on_start call for them, since setting attributes on ended spans has no effect anyway and the spans are still properly exported via on_end.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Contributor
Author
|
Reproducer: """
Reproducer for logfire tail sampling "Setting attribute on ended span" warning.
The warning occurs when:
1. Tail sampling buffers spans (not immediately sampled)
2. A later event triggers sampling (error, duration threshold)
3. push_buffer() calls on_start for already-ended spans
Run this script and make requests to trigger the warning.
"""
import asyncio
import logging
import time
# Configure logging BEFORE logfire to see warnings
logging.basicConfig(
level=logging.DEBUG,
format="%(levelname)s | %(name)s:%(funcName)s:%(lineno)d | %(message)s",
)
import logfire
from fastapi import FastAPI
import uvicorn
def tail_sampling_function(span_info: logfire.sampling.TailSamplingSpanInfo) -> float:
"""
Tail sampling that triggers on errors or long duration.
Most traces are buffered and discarded (rate=0.0).
Only traces with errors or duration > 0.5s are sampled.
"""
# Only sample if there's an error or duration > 0.5 seconds
if span_info.level >= "error":
return 1.0
if span_info.duration > 0.5:
return 1.0
# Don't sample normal requests - they stay buffered then get discarded
return 0.0
# Configure logfire with tail sampling
logfire.configure(
service_name="reproducer",
send_to_logfire=False,
console=logfire.ConsoleOptions(), # Enable console output to see traces
sampling=logfire.SamplingOptions(tail=tail_sampling_function),
)
app = FastAPI()
logfire.instrument_fastapi(app)
@app.get("/fast")
async def fast_endpoint():
"""Fast endpoint - won't trigger sampling, no warning."""
return {"status": "fast"}
@app.get("/slow")
async def slow_endpoint():
"""
Slow endpoint - triggers sampling after child spans have ended.
This WILL produce the warning because:
1. Child spans are created and end quickly
2. They get buffered
3. After 0.6s, duration threshold is exceeded
4. push_buffer() calls on_start for already-ended child spans
"""
# Create some child spans that end before the sampling decision
with logfire.span("child_span_1"):
await asyncio.sleep(0.05)
with logfire.span("child_span_2"):
await asyncio.sleep(0.05)
with logfire.span("child_span_3"):
await asyncio.sleep(0.05)
# Now wait to exceed duration threshold
# This triggers sampling AFTER child spans have ended
await asyncio.sleep(0.5)
return {"status": "slow", "message": "Check logs for 'Setting attribute on ended span' warning"}
@app.get("/error")
async def error_endpoint():
"""
Error endpoint - triggers sampling when error is logged.
Similar to slow, but triggers on error level instead of duration.
"""
with logfire.span("setup_span"):
await asyncio.sleep(0.05)
with logfire.span("processing_span"):
await asyncio.sleep(0.05)
# Log an error - this triggers sampling after child spans ended
logfire.error("Something went wrong!")
return {"status": "error logged"}
@app.get("/nested")
async def nested_endpoint():
"""
Deeply nested spans - more child spans = more warnings.
"""
async def do_work(depth: int):
if depth <= 0:
return
with logfire.span(f"nested_level_{depth}"):
await asyncio.sleep(0.02)
await do_work(depth - 1)
await do_work(5)
# Trigger sampling after all nested spans ended
await asyncio.sleep(0.5)
return {"status": "nested complete"}
if __name__ == "__main__":
print("""
Reproducer for "Setting attribute on ended span" warning.
Endpoints:
GET /fast - Fast request, no sampling, no warning
GET /slow - Slow request (0.6s), triggers sampling, PRODUCES WARNING
GET /error - Logs error, triggers sampling, PRODUCES WARNING
GET /nested - Nested spans + slow, PRODUCES MULTIPLE WARNINGS
To reproduce:
curl http://localhost:8000/slow
Watch the console for:
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
""")
uvicorn.run(app, host="0.0.0.0", port=8000, log_level="debug")python reproducer.py
Reproducer for "Setting attribute on ended span" warning.
Endpoints:
GET /fast - Fast request, no sampling, no warning
GET /slow - Slow request (0.6s), triggers sampling, PRODUCES WARNING
GET /error - Logs error, triggers sampling, PRODUCES WARNING
GET /nested - Nested spans + slow, PRODUCES MULTIPLE WARNINGS
To reproduce:
curl http://localhost:8000/slow
Watch the console for:
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
INFO: Started server process [3139581]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: 127.0.0.1:42244 - "GET /slow HTTP/1.1" 200 OK
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
03:12:53.329 GET /error
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
03:12:53.436 Something went wrong!
INFO: 127.0.0.1:42254 - "GET /error HTTP/1.1" 200 OK
INFO: 127.0.0.1:38208 - "GET /nested HTTP/1.1" 200 OK
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
^CINFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When
TailSamplingProcessor.push_buffer()replays buffered spans, it was calling on_start for spans that had already ended. This caused wrapped processors likeDirectBaggageAttributesSpanProcessorto callspan.set_attributes()on ended spans, triggering OpenTelemetry warnings.This fix tracks which spans have already ended and skips the
on_startcall for them, since setting attributes on ended spans has no effect anyway and the spans are still properly exported viaon_end.