Skip to content

Conversation

@cyberksh
Copy link
Contributor

When TailSamplingProcessor.push_buffer() replays buffered spans, it was calling on_start for spans that had already ended. This caused wrapped processors like DirectBaggageAttributesSpanProcessor to call span.set_attributes() on ended spans, triggering OpenTelemetry warnings.

This fix tracks which spans have already ended and skips the on_start call for them, since setting attributes on ended spans has no effect anyway and the spans are still properly exported via on_end.

When TailSamplingProcessor.push_buffer() replays buffered spans, it was
calling on_start for spans that had already ended. This caused wrapped
processors like DirectBaggageAttributesSpanProcessor to call
span.set_attributes() on ended spans, triggering OpenTelemetry warnings.

This fix tracks which spans have already ended and skips the on_start
call for them, since setting attributes on ended spans has no effect
anyway and the spans are still properly exported via on_end.
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@cyberksh cyberksh marked this pull request as draft January 16, 2026 21:32
@cyberksh
Copy link
Contributor Author

Reproducer:

"""
Reproducer for logfire tail sampling "Setting attribute on ended span" warning.

The warning occurs when:
1. Tail sampling buffers spans (not immediately sampled)
2. A later event triggers sampling (error, duration threshold)
3. push_buffer() calls on_start for already-ended spans

Run this script and make requests to trigger the warning.
"""

import asyncio
import logging
import time

# Configure logging BEFORE logfire to see warnings
logging.basicConfig(
    level=logging.DEBUG,
    format="%(levelname)s | %(name)s:%(funcName)s:%(lineno)d | %(message)s",
)

import logfire
from fastapi import FastAPI
import uvicorn


def tail_sampling_function(span_info: logfire.sampling.TailSamplingSpanInfo) -> float:
    """
    Tail sampling that triggers on errors or long duration.
    Most traces are buffered and discarded (rate=0.0).
    Only traces with errors or duration > 0.5s are sampled.
    """
    # Only sample if there's an error or duration > 0.5 seconds
    if span_info.level >= "error":
        return 1.0
    if span_info.duration > 0.5:
        return 1.0
    # Don't sample normal requests - they stay buffered then get discarded
    return 0.0


# Configure logfire with tail sampling
logfire.configure(
    service_name="reproducer",
    send_to_logfire=False,
    console=logfire.ConsoleOptions(),  # Enable console output to see traces
    sampling=logfire.SamplingOptions(tail=tail_sampling_function),
)

app = FastAPI()
logfire.instrument_fastapi(app)


@app.get("/fast")
async def fast_endpoint():
    """Fast endpoint - won't trigger sampling, no warning."""
    return {"status": "fast"}


@app.get("/slow")
async def slow_endpoint():
    """
    Slow endpoint - triggers sampling after child spans have ended.
    This WILL produce the warning because:
    1. Child spans are created and end quickly
    2. They get buffered
    3. After 0.6s, duration threshold is exceeded
    4. push_buffer() calls on_start for already-ended child spans
    """
    # Create some child spans that end before the sampling decision
    with logfire.span("child_span_1"):
        await asyncio.sleep(0.05)

    with logfire.span("child_span_2"):
        await asyncio.sleep(0.05)

    with logfire.span("child_span_3"):
        await asyncio.sleep(0.05)

    # Now wait to exceed duration threshold
    # This triggers sampling AFTER child spans have ended
    await asyncio.sleep(0.5)

    return {"status": "slow", "message": "Check logs for 'Setting attribute on ended span' warning"}


@app.get("/error")
async def error_endpoint():
    """
    Error endpoint - triggers sampling when error is logged.
    Similar to slow, but triggers on error level instead of duration.
    """
    with logfire.span("setup_span"):
        await asyncio.sleep(0.05)

    with logfire.span("processing_span"):
        await asyncio.sleep(0.05)

    # Log an error - this triggers sampling after child spans ended
    logfire.error("Something went wrong!")

    return {"status": "error logged"}


@app.get("/nested")
async def nested_endpoint():
    """
    Deeply nested spans - more child spans = more warnings.
    """

    async def do_work(depth: int):
        if depth <= 0:
            return
        with logfire.span(f"nested_level_{depth}"):
            await asyncio.sleep(0.02)
            await do_work(depth - 1)

    await do_work(5)

    # Trigger sampling after all nested spans ended
    await asyncio.sleep(0.5)

    return {"status": "nested complete"}


if __name__ == "__main__":
    print("""
Reproducer for "Setting attribute on ended span" warning.

Endpoints:
  GET /fast   - Fast request, no sampling, no warning
  GET /slow   - Slow request (0.6s), triggers sampling, PRODUCES WARNING
  GET /error  - Logs error, triggers sampling, PRODUCES WARNING
  GET /nested - Nested spans + slow, PRODUCES MULTIPLE WARNINGS

To reproduce:
  curl http://localhost:8000/slow

Watch the console for:
  WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
""")
    uvicorn.run(app, host="0.0.0.0", port=8000, log_level="debug")
python reproducer.py

Reproducer for "Setting attribute on ended span" warning.

Endpoints:
  GET /fast   - Fast request, no sampling, no warning
  GET /slow   - Slow request (0.6s), triggers sampling, PRODUCES WARNING
  GET /error  - Logs error, triggers sampling, PRODUCES WARNING
  GET /nested - Nested spans + slow, PRODUCES MULTIPLE WARNINGS

To reproduce:
  curl http://localhost:8000/slow

Watch the console for:
  WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.

INFO:     Started server process [3139581]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:42244 - "GET /slow HTTP/1.1" 200 OK
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
03:12:53.329 GET /error
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
03:12:53.436   Something went wrong!
INFO:     127.0.0.1:42254 - "GET /error HTTP/1.1" 200 OK
INFO:     127.0.0.1:38208 - "GET /nested HTTP/1.1" 200 OK
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.

@cyberksh cyberksh marked this pull request as ready for review January 16, 2026 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant