observability: add tokenizer event-loop-lag metric by Kangyan-Zhou · Pull Request #29527 · sgl-project/sglang

Kangyan-Zhou · 2026-06-27T19:51:04Z

Motivation

When comparing TTFT between the experimental sgl-router and the engine, the router consistently reports higher TTFT than the engine does for the same traffic. Part of that gap is delay that accrues before the engine starts its TTFT clock: the engine measures TTFT as received_time -> first token, but received_time is stamped inside the TokenizerManager process. If that process's asyncio event loop is starved (synchronous work, GIL contention, huge-prompt HF encoding), a request sits unstamped — invisible to engine-side TTFT but very visible to the client and the router.

This PR surfaces that hidden slice as a first-class metric so the router-vs-engine TTFT gap can be attributed on the engine side.

What this adds

sglang:event_loop_lag_seconds — a histogram of the tokenizer process's asyncio event-loop scheduling lag, with sub-millisecond floor (0.0005s) through a multi-second ceiling (10s). Healthy lag is sub-millisecond; a starved loop lands at 0.1s+.
A lightweight monitor coroutine TokenizerManager.watch_event_loop_lag, started only when enable_metrics is set. Each tick asks to sleep a fixed interval and records the overrun (a healthy loop wakes on time → lag ~0; a blocked loop wakes late → the overrun is how long it was unresponsive).
The metric carries the collector's own (process-wide) labels rather than per-request labels.

Each TokenizerManager — or each TokenizerWorker in multi-tokenizer mode — stamps received_time on the same loop it serves, so monitoring that loop directly measures the pre-received_time delay. The MultiTokenizerRouter forwarding loop does not stamp received_time, so it is intentionally not monitored.

Test

Adds test/registered/unit/observability/test_event_loop_lag.py (pure-CPU, registered to base-a-test-cpu):

metric is registered and observe_event_loop_lag routes through the collector's labels;
a synchronously-blocked loop produces a lag sample proportional to the block duration.

Both tests pass:

test_event_loop_lag.py::TestEventLoopLagMetricWiring::test_metric_registered_and_routes_through_labels PASSED
test_event_loop_lag.py::TestWatchEventLoopLag::test_blocked_loop_records_lag PASSED

🤖 Generated with Claude Code

CI States

Latest PR Test (Base): ❌ Run #28299976283
Latest PR Test (Extra): ❌ Run #28299976219

Add `sglang:event_loop_lag_seconds`, a histogram of the tokenizer process's asyncio event-loop scheduling lag. Each TokenizerManager (or each TokenizerWorker in multi-tokenizer mode) stamps `received_time` on the same loop it serves, so sustained lag is delay that accrues *before* `received_time` -- the slice of client-observed TTFT that engine TTFT (`received_time` -> first token) cannot see. A lightweight monitor coroutine (`watch_event_loop_lag`), started only when metrics are enabled, asks to sleep a fixed interval and records the overrun: a healthy loop wakes on time (lag ~0), a loop blocked by synchronous work or GIL contention wakes late. This surfaces the router-vs-engine TTFT gap on the engine side. The MultiTokenizerRouter forwarding loop does not stamp `received_time`, so it is intentionally not monitored. Adds a pure-CPU unit test covering metric wiring and that a blocked loop records a proportional lag sample. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01FnAPbVdPu5JtbmiX9tPUEn

gemini-code-assist

Code Review

This pull request introduces a new metric, sglang:event_loop_lag_seconds, to monitor the asyncio event loop scheduling lag in the tokenizer process, along with corresponding unit tests. The reviewer pointed out a potential measurement inaccuracy in the lag calculation and a process crash risk if metrics observation fails, suggesting a robust try-except block and simplified loop logic.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-27T19:52:20Z

+    async def watch_event_loop_lag(self, interval: float = 0.1):
+        """Sample this process's asyncio event-loop scheduling lag as a metric.
+
+        Each tick asks to sleep ``interval`` seconds. A healthy loop wakes on
+        time (lag ~0); a loop blocked by synchronous work or GIL contention wakes
+        late, and the overrun is how long it was unresponsive -- and thus unable
+        to stamp incoming requests' received_time.
+        """
+        loop = asyncio.get_running_loop()
+        next_tick = loop.time() + interval
+        while True:
+            await asyncio.sleep(interval)
+            now = loop.time()
+            lag = now - next_tick
+            self.metrics_collector.observe_event_loop_lag(lag if lag > 0.0 else 0.0)
+            next_tick = now + interval


Issue: Potential Measurement Inaccuracy and Process Crash Risk

Measurement Inaccuracy: In the current implementation, next_tick is calculated as now + interval before self.metrics_collector.observe_event_loop_lag is called. Any CPU time spent inside observe_event_loop_lag (or spent running other tasks on the event loop before the next await asyncio.sleep is reached) will be incorrectly counted as "scheduling lag" in the subsequent iteration. Measuring the scheduled wakeup time immediately before calling asyncio.sleep avoids this accumulation of overhead.

Process Crash Risk: Since this background task is wrapped in print_exception_wrapper, any unhandled exception raised during metrics observation (e.g., if self.metrics_collector is None or Prometheus client fails) will trigger the wrapper's exception handler, which logs the error and terminates the entire server process. Wrapping the observation call in a try-except block prevents metrics failures from crashing the production server.

Suggested Improvement

We can simplify the logic, eliminate the stateful next_tick variable, and make the metric collection robust against unexpected errors.

Suggested change

async def watch_event_loop_lag(self, interval: float = 0.1):

"""Sample this process's asyncio event-loop scheduling lag as a metric.

Each tick asks to sleep ``interval`` seconds. A healthy loop wakes on

time (lag ~0); a loop blocked by synchronous work or GIL contention wakes

late, and the overrun is how long it was unresponsive -- and thus unable

to stamp incoming requests' received_time.

"""

loop = asyncio.get_running_loop()

next_tick = loop.time() + interval

while True:

await asyncio.sleep(interval)

now = loop.time()

lag = now - next_tick

self.metrics_collector.observe_event_loop_lag(lag if lag > 0.0 else 0.0)

next_tick = now + interval

async def watch_event_loop_lag(self, interval: float = 0.1):

"""Sample this process's asyncio event-loop scheduling lag as a metric.

Each tick asks to sleep ``interval`` seconds. A healthy loop wakes on

time (lag ~0); a loop blocked by synchronous work or GIL contention wakes

late, and the overrun is how long it was unresponsive -- and thus unable

to stamp incoming requests' received_time.

"""

loop = asyncio.get_running_loop()

while True:

scheduled_wakeup = loop.time() + interval

await asyncio.sleep(interval)

lag = loop.time() - scheduled_wakeup

try:

self.metrics_collector.observe_event_loop_lag(max(0.0, lag))

except Exception as e:

logger.warning("Failed to observe event loop lag: %s", e)

gemini-code-assist Bot reviewed Jun 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

observability: add tokenizer event-loop-lag metric#29527

observability: add tokenizer event-loop-lag metric#29527
Kangyan-Zhou wants to merge 1 commit into
sgl-project:mainfrom
Kangyan-Zhou:engine-loop-lag-metric

Kangyan-Zhou commented Jun 27, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Kangyan-Zhou commented Jun 27, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

What this adds

Test

CI States

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Choose a reason for hiding this comment

Issue: Potential Measurement Inaccuracy and Process Crash Risk

Suggested Improvement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kangyan-Zhou commented Jun 27, 2026 •

edited by github-actions Bot

Loading