Commit e52e92c
authored
Instant ttft oversaturation (#607)
## Summary
When over-saturation detection is enabled (`--detect-saturation`), the
constraint can only receive TTFT data after a request fully completes.
With large models and long contexts, no request completes within the
`minimum_duration` window (default 30s), so the constraint falls back to
concurrent slope alone and stops prematurely.
This PR adds time-bounded instant TTFT notifications: when
over-saturation detection is enabled, worker processes monitor for
first-token arrival during streaming and send a `"first_token_arrived"`
status update before the request completes. This gives the constraint
real TTFT data for a two-signal decision. Notifications are sent only
during the first `minimum_duration` seconds of the benchmark to limit
IPC overhead.
## Details
- [x] Add `"first_token_arrived"` to `RequestInfo.status` literal
(`schemas/info.py`)
- [x] Add TTFT polling monitor to `WorkerProcess` — spawns an async task
per request that detects `first_token_iteration` and sends a
`"first_token_arrived"` update (`scheduler/worker.py`)
- [x] Time-bound the monitor: notifications stop after
`minimum_duration` seconds via `instant_ttft_duration`
(`scheduler/worker.py`)
- [x] Handle `"first_token_arrived"` in `WorkerGroupState` — no request
count changes, passes through to constraints
(`scheduler/worker_group.py`)
- [x] Extract `minimum_duration` from `OverSaturationConstraint` to
configure worker TTFT duration (`scheduler/worker_group.py`)
- [x] Accept TTFT from both `"first_token_arrived"` and `"completed"` in
the constraint, deduplicated by request ID
(`scheduler/constraints/saturation.py`)
- [x] Add 8 tests covering happy path, dedup, missing timings, backward
compatibility, concurrent isolation, disabled mode, reset, and
multi-request slope building
## Test Plan
- Run `pytest tests/unit/scheduler/ tests/unit/schemas/
tests/unit/backends/` — 1077 passed
- Run `pre-commit run --files` on changed files — all checks pass
- Verify with `--detect-saturation` on a large model with long context
(>10k tokens) that the benchmark no longer stops prematurely
## Related Issues
- Resolves #606
---
- [x] "I certify that all code in this PR is my own, except as noted
below."
## Use of AI
- [x] Includes AI-assisted code completion
- [x] Includes code generated by an AI application
- [x] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)File tree
7 files changed
+327
-17
lines changed- src/guidellm
- backends/openai
- scheduler
- constraints
- schemas
- tests/unit/scheduler
7 files changed
+327
-17
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
282 | | - | |
| 282 | + | |
283 | 283 | | |
284 | 284 | | |
285 | 285 | | |
| |||
377 | 377 | | |
378 | 378 | | |
379 | 379 | | |
| 380 | + | |
380 | 381 | | |
381 | 382 | | |
382 | 383 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
367 | 367 | | |
368 | 368 | | |
369 | 369 | | |
| 370 | + | |
370 | 371 | | |
371 | 372 | | |
372 | 373 | | |
| |||
519 | 520 | | |
520 | 521 | | |
521 | 522 | | |
522 | | - | |
523 | | - | |
524 | | - | |
525 | | - | |
526 | | - | |
527 | | - | |
528 | | - | |
529 | | - | |
530 | | - | |
531 | | - | |
532 | | - | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
533 | 536 | | |
534 | 537 | | |
535 | 538 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
119 | 119 | | |
120 | 120 | | |
121 | 121 | | |
122 | | - | |
| 122 | + | |
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
| 129 | + | |
| 130 | + | |
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
365 | 365 | | |
366 | 366 | | |
367 | 367 | | |
368 | | - | |
369 | 368 | | |
370 | 369 | | |
371 | 370 | | |
372 | 371 | | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
373 | 380 | | |
374 | 381 | | |
375 | 382 | | |
| |||
428 | 435 | | |
429 | 436 | | |
430 | 437 | | |
431 | | - | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
432 | 444 | | |
433 | 445 | | |
434 | 446 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
643 | 643 | | |
644 | 644 | | |
645 | 645 | | |
| 646 | + | |
| 647 | + | |
646 | 648 | | |
647 | 649 | | |
648 | 650 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
130 | 136 | | |
131 | 137 | | |
132 | 138 | | |
| |||
0 commit comments