Skip to content
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
138 commits
Select commit Hold shift + click to select a range
501c5ab
opt metrics structure
LJH-LBJ Jan 15, 2026
d261ba5
opt loggers
LJH-LBJ Jan 15, 2026
8538816
Merge branch 'vllm-project:main' into opt_metrics_structure
LJH-LBJ Jan 16, 2026
89493ac
opt metrics structure
LJH-LBJ Jan 16, 2026
f69ed2d
fix bug
LJH-LBJ Jan 16, 2026
e3a44db
fix bug
LJH-LBJ Jan 19, 2026
da0ad3d
fix bug
LJH-LBJ Jan 19, 2026
cb85a3d
fix bug
LJH-LBJ Jan 19, 2026
371beae
fix bug
LJH-LBJ Jan 19, 2026
2b98563
opt loggers
LJH-LBJ Jan 19, 2026
134a901
opt metrics structure
LJH-LBJ Jan 20, 2026
ee12352
opt format
LJH-LBJ Jan 21, 2026
0af170c
Merge branch 'vllm-project:main' into opt_metrics_structure
LJH-LBJ Jan 21, 2026
cf7e2c0
opt metrics structure
LJH-LBJ Jan 21, 2026
eb51d12
opt metrics structure
LJH-LBJ Jan 21, 2026
38785ed
opt metrics structure
LJH-LBJ Jan 21, 2026
aeb5fd6
opt test
LJH-LBJ Jan 21, 2026
924f747
opt loggers
LJH-LBJ Jan 21, 2026
8fd556b
Merge branch 'vllm-project:main' into opt_metrics_structure
LJH-LBJ Jan 22, 2026
a78af4d
fix bug
LJH-LBJ Jan 22, 2026
44c9635
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Jan 22, 2026
7c08073
fix bug
LJH-LBJ Jan 22, 2026
443022b
fix bug
LJH-LBJ Jan 22, 2026
fbbce79
fix bug
LJH-LBJ Jan 22, 2026
2b2edfc
fix bug
LJH-LBJ Jan 22, 2026
a42a656
opt metrics in offline
LJH-LBJ Jan 22, 2026
e7f3fae
fix bug
LJH-LBJ Jan 22, 2026
2e87f70
fix bug
LJH-LBJ Jan 22, 2026
93b53f8
fix pre-commit
LJH-LBJ Jan 23, 2026
d584e71
fix bug
LJH-LBJ Jan 23, 2026
d359775
fix bug
LJH-LBJ Jan 23, 2026
6ebe5ee
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 26, 2026
ab248bb
Merge remote-tracking branch 'origin/main' into opt_metrics_structure
LJH-LBJ Jan 26, 2026
04230c8
fix bug
LJH-LBJ Jan 26, 2026
fdfc9b5
fix bug
LJH-LBJ Jan 26, 2026
b5d154a
add audio frames
LJH-LBJ Jan 27, 2026
274a784
add audio frames
LJH-LBJ Jan 27, 2026
43a266b
add image image_num and resolution
LJH-LBJ Jan 27, 2026
e4ff53e
add image image_num and resolution
LJH-LBJ Jan 27, 2026
13a87f2
add image image_num and resolution
LJH-LBJ Jan 27, 2026
bacd480
add audio frames in offline
LJH-LBJ Jan 27, 2026
b339c38
add audio frames in offline
LJH-LBJ Jan 27, 2026
935481c
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 27, 2026
2263dd1
fix pre-commit
LJH-LBJ Jan 27, 2026
f3b88b1
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Jan 27, 2026
f0bdfaa
fix pre-commit
LJH-LBJ Jan 27, 2026
da7a271
change enable_stats to log_stats
LJH-LBJ Jan 27, 2026
bcd9ac4
fix bug
LJH-LBJ Jan 27, 2026
abf941e
fix pre-commit
LJH-LBJ Jan 27, 2026
03afeaf
delete 0 row
LJH-LBJ Jan 27, 2026
842af89
delete 0 row
LJH-LBJ Jan 27, 2026
56ecac3
fix pre-commit
LJH-LBJ Jan 27, 2026
cbdac45
delete 0 row
LJH-LBJ Jan 27, 2026
8ee59ce
delete 0 row
LJH-LBJ Jan 27, 2026
48707f0
opt
LJH-LBJ Jan 28, 2026
9d76475
fix pre-commit
LJH-LBJ Jan 28, 2026
2631578
fix bug
LJH-LBJ Jan 28, 2026
e0ce96f
fix pre-commit
LJH-LBJ Jan 28, 2026
04da676
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 28, 2026
26e18b3
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 28, 2026
a8bcbc0
fix pre-commit
LJH-LBJ Jan 28, 2026
114a6a3
fix pre-commit
LJH-LBJ Jan 28, 2026
9126e68
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 28, 2026
0b905cf
fix bug
LJH-LBJ Jan 28, 2026
3481dcc
Merge branch 'main' into opt_metrics_structure
hsliuustc0106 Jan 28, 2026
7665b29
opt
LJH-LBJ Jan 29, 2026
c78d420
opt
LJH-LBJ Jan 29, 2026
2b37f16
opt
LJH-LBJ Jan 29, 2026
78963fb
opt
LJH-LBJ Jan 29, 2026
141d8f8
remove ut in test_async_omni
LJH-LBJ Jan 29, 2026
7c95eb9
fix pre-commit
LJH-LBJ Jan 29, 2026
6687f65
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 29, 2026
68074ac
add test in pipeline.yaml
LJH-LBJ Jan 29, 2026
ef34329
fix bug
LJH-LBJ Jan 29, 2026
bff608c
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 29, 2026
a59c766
fix bug
LJH-LBJ Jan 29, 2026
4918ab1
fix pre-commit
LJH-LBJ Jan 29, 2026
4976551
rerun
LJH-LBJ Jan 29, 2026
d646401
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 29, 2026
5efbd55
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 29, 2026
e83a338
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 30, 2026
55c11c1
opt test
LJH-LBJ Jan 30, 2026
a94349b
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Jan 30, 2026
6e63657
rerun
LJH-LBJ Jan 30, 2026
9a31bae
rerun
LJH-LBJ Jan 30, 2026
232da73
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 30, 2026
13b0050
rerun
LJH-LBJ Jan 30, 2026
db0d866
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Jan 30, 2026
21de7db
fix bug
LJH-LBJ Jan 30, 2026
dd051b2
fix pre-commit
LJH-LBJ Jan 30, 2026
dd73daf
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 30, 2026
c1c48f9
Merge branch 'main' into opt_metrics_structure
hsliuustc0106 Jan 30, 2026
4e6acbe
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 30, 2026
d3c6f54
rerun
LJH-LBJ Jan 30, 2026
bd6d8cd
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 31, 2026
654073f
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 31, 2026
c9068a7
add doc
LJH-LBJ Jan 31, 2026
6626d62
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Jan 31, 2026
fe0e4b9
add doc
LJH-LBJ Jan 31, 2026
89f3944
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Jan 31, 2026
3b311f4
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Feb 3, 2026
4b39808
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Feb 4, 2026
3fff139
fix pre-commit
LJH-LBJ Feb 4, 2026
b9c2d46
fix pre-commit
LJH-LBJ Feb 4, 2026
0bb732e
opt
LJH-LBJ Feb 4, 2026
7c91e96
fix pre-commit
LJH-LBJ Feb 4, 2026
da335c7
opt
LJH-LBJ Feb 4, 2026
5abc397
opt
LJH-LBJ Feb 4, 2026
fb3bacf
fix pre-commit
LJH-LBJ Feb 4, 2026
51f5e0a
fix pre-commit
LJH-LBJ Feb 4, 2026
41db219
opt
LJH-LBJ Feb 4, 2026
3a95be0
fix pre-commit
LJH-LBJ Feb 4, 2026
571f297
Merge branch 'vllm-project:main' into opt_metrics_structure
LJH-LBJ Feb 5, 2026
ca2cb26
fix bug
LJH-LBJ Feb 5, 2026
48a519c
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Feb 5, 2026
1bd59d8
fix bug
LJH-LBJ Feb 5, 2026
24f8bc8
fix bug
LJH-LBJ Feb 5, 2026
ef2d5d6
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Feb 5, 2026
23a24ee
fix pre-commit
LJH-LBJ Feb 5, 2026
dd5d7b7
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Feb 5, 2026
41c58d4
fix bug
LJH-LBJ Feb 5, 2026
9145181
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Feb 6, 2026
f1195f8
fix ut
LJH-LBJ Feb 6, 2026
4383b01
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Feb 6, 2026
41482ff
fix ut
LJH-LBJ Feb 6, 2026
f07d070
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Feb 6, 2026
764151d
fix ut
LJH-LBJ Feb 6, 2026
f1b41d3
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Feb 6, 2026
382327e
fix dependencies
LJH-LBJ Feb 6, 2026
75be00c
opt stage_wall_time_ms and move metrics.md tp contributing
LJH-LBJ Feb 6, 2026
3ffa4cd
add stage's final_output_type in StageRequestStats
LJH-LBJ Feb 6, 2026
e352716
fix bug
LJH-LBJ Feb 6, 2026
7faa2e2
rerun
LJH-LBJ Feb 6, 2026
42f6f0f
update doc
LJH-LBJ Feb 6, 2026
e7c502f
opt
LJH-LBJ Feb 6, 2026
a71fa64
Merge branch 'main' into opt_metrics_structure
LJH-LBJ Feb 6, 2026
fd9d3d4
fix pre-commit
LJH-LBJ Feb 6, 2026
00e7b78
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ Feb 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions docs/api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ Main entry points for vLLM-Omni inference and serving.
- [vllm_omni.entrypoints.chat_utils.parse_chat_messages_futures][]
- [vllm_omni.entrypoints.cli.serve.OmniServeCommand][]
- [vllm_omni.entrypoints.client_request_state.ClientRequestState][]
- [vllm_omni.entrypoints.log_utils.OrchestratorMetrics][]
- [vllm_omni.entrypoints.log_utils.StageRequestMetrics][]
- [vllm_omni.entrypoints.log_utils.StageStats][]
- [vllm_omni.entrypoints.omni.Omni][]
- [vllm_omni.entrypoints.omni.OmniBase][]
- [vllm_omni.entrypoints.omni_diffusion.OmniDiffusion][]
Expand Down Expand Up @@ -120,3 +117,8 @@ Worker classes and model runners for distributed inference.
- [vllm_omni.worker.npu.npu_generation_model_runner.NPUGenerationModelRunner][]
- [vllm_omni.worker.npu.npu_generation_worker.NPUGenerationWorker][]
- [vllm_omni.worker.npu.npu_model_runner.OmniNPUModelRunner][]


## Metrics

- [vllm_omni.metrics.OrchestratorAggregator][]
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ dependencies = [
"soundfile>=0.13.1",
"cache-dit==1.2.0",
"tqdm>=4.66.0",
"prettytable>=3.9.0",
"torchsde>=0.2.6", # Required for Stable Audio scheduler
"openai-whisper>=20250625",
# "vllm==0.14.0", # TODO: fix the entrypoints overwrite problem
Expand Down
19 changes: 15 additions & 4 deletions tests/e2e/online_serving/test_async_omni.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,19 @@ def __init__(self, request_id: str):
output_modalities, engine.output_modalities, engine.stage_list
)
summary = capture_metrics[request_ids[idx]].metrics.build_and_log_summary(final_stage_id_for_e2e)
overall = summary["overall_summary"]
assert overall["e2e_wall_time_ms"] >= 0.0

# Check that total tokens matches sum of stage tokens.
assert summary["e2e_total_tokens"] == sum(stage["tokens"] for stage in summary["stages"])
# Check that total time matches sum of stage times.
assert summary["e2e_total_time_ms"] >= sum(stage["total_time_ms"] for stage in summary["stages"])
# Check that total tokens matches sum of stage tokens for this request.
stage_entry = next(
entry for entry in summary["stage_table"] if entry["request_id"] == request_ids[idx]
)
stage_sum = sum(
(stage.get("num_tokens_in", 0) if stage.get("stage_id") == 0 else 0)
+ stage.get("num_tokens_out", 0)
for stage in stage_entry["stages"]
)
e2e_entry = next(
entry for entry in summary["e2e_table"] if entry["request_id"] == request_ids[idx]
)
assert e2e_entry["e2e_total_tokens"] == stage_sum
121 changes: 121 additions & 0 deletions tests/metrics/test_stats.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
from __future__ import annotations
from vllm_omni.metrics import OrchestratorAggregator
from vllm_omni.metrics.stats import RequestE2EStats


def _get_request_entry(table: list[dict], request_id: str) -> dict:
for entry in table:
if entry.get("request_id") == request_id:
return entry
raise AssertionError(f"request_id={request_id} not found")


def test_orchestrator_aggregator_builds_summary() -> None:
agg = OrchestratorAggregator(num_stages=2, enable_stats=False, wall_start_ts=0.0)
agg.stage_first_ts[0] = 0.0
agg.stage_last_ts[0] = 0.03
agg.stage_first_ts[1] = 0.05
agg.stage_last_ts[1] = 0.07

agg.on_forward(0, 1, "r1", size_bytes=1024, tx_ms=5.0, used_shm=False)
agg.on_stage_metrics(
0,
"r1",
{
"num_tokens_in": 3,
"num_tokens_out": 3,
"stage_gen_time_ms": 30.0,
"batch_id": 1,
"batch_size": 1,
"rx_transfer_bytes": 0,
"rx_decode_time_ms": 0.0,
},
)
agg.on_stage_metrics(
1,
"r1",
{
"num_tokens_out": 4,
"stage_gen_time_ms": 20.0,
"batch_id": 1,
"batch_size": 1,
"rx_transfer_bytes": 1024,
"rx_decode_time_ms": 5.0,
"rx_in_flight_time_ms": 2.0,
},
)
agg.on_finalize_request(1, "r1", req_start_ts=0.0)

summary = agg.build_and_log_summary(final_stage_id_to_prompt={"r1": 1})
overall = summary["overall_summary"]
assert overall["e2e_requests"] == 1

stage_entry = _get_request_entry(summary["stage_table"], "r1")
stage_ids = [row["stage_id"] for row in stage_entry["stages"]]
assert stage_ids == [0, 1]

transfer_entry = _get_request_entry(summary["trans_table"], "r1")
assert transfer_entry["transfers"][0]["edge"] == "0->1"
assert transfer_entry["transfers"][0]["size_kbytes"] == 1.0

e2e_entry = _get_request_entry(summary["e2e_table"], "r1")
assert e2e_entry["e2e_total_tokens"] == 10


def test_build_and_log_summary_e2e_only() -> None:
agg = OrchestratorAggregator(num_stages=1, enable_stats=False, wall_start_ts=0.0)
agg.e2e_events.append(
RequestE2EStats(
request_id="r",
e2e_total_ms=10.0,
e2e_total_tokens=5,
transfers_total_time_ms=0.0,
transfers_total_bytes=0,
)
)

summary = agg.build_and_log_summary(final_stage_id_to_prompt=0)
e2e_entry = _get_request_entry(summary["e2e_table"], "r")
assert e2e_entry["e2e_total_tokens"] == 5
stage_entry = _get_request_entry(summary["stage_table"], "r")
assert stage_entry["stages"] == []


def test_build_and_log_summary_multiple_requests() -> None:
agg = OrchestratorAggregator(num_stages=1, enable_stats=False, wall_start_ts=0.0)

agg.on_stage_metrics(
0,
"r1",
{
"num_tokens_in": 2,
"num_tokens_out": 4,
"batch_id": 1,
"batch_size": 1,
"stage_gen_time_ms": 10.0,
"rx_transfer_bytes": 0,
"rx_decode_time_ms": 0.0,
"rx_in_flight_time_ms": 0.0,
},
)
agg.on_finalize_request(0, "r1", req_start_ts=0.0)

agg.on_stage_metrics(
0,
"r2",
{
"num_tokens_in": 1,
"num_tokens_out": 2,
"batch_id": 2,
"batch_size": 1,
"stage_gen_time_ms": 12.0,
"rx_transfer_bytes": 0,
"rx_decode_time_ms": 0.0,
"rx_in_flight_time_ms": 0.0,
},
)
agg.on_finalize_request(0, "r2", req_start_ts=0.0)

summary = agg.build_and_log_summary(final_stage_id_to_prompt=0)
assert len(summary["stage_table"]) == 2
assert {entry["request_id"] for entry in summary["e2e_table"]} == {"r1", "r2"}
3 changes: 2 additions & 1 deletion vllm_omni/distributed/omni_connectors/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from typing import Any

from vllm_omni.entrypoints.stage_utils import OmniStageTaskType
from vllm_omni.metrics import OrchestratorAggregator

from .utils.logging import get_connector_logger

Expand All @@ -23,7 +24,7 @@ def try_send_via_connector(
sampling_params: Any,
original_prompt: Any,
next_stage_queue_submit_fn: Callable[[dict[str, Any]], None],
metrics: Any,
metrics: OrchestratorAggregator,
) -> bool:
"""
Attempts to send data via OmniConnector.
Expand Down
28 changes: 15 additions & 13 deletions vllm_omni/entrypoints/async_omni.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from vllm.sampling_params import SamplingParams
from vllm.tokenizers import TokenizerLike
from vllm.v1.engine.exceptions import EngineDeadError
import vllm.envs as envs

# Internal imports (our code)
from vllm_omni.config import OmniModelConfig
Expand All @@ -24,16 +25,14 @@
from vllm_omni.distributed.ray_utils.utils import try_close_ray
from vllm_omni.engine.input_processor import OmniInputProcessor
from vllm_omni.entrypoints.client_request_state import ClientRequestState
from vllm_omni.entrypoints.log_utils import (
OrchestratorMetrics,
)
from vllm_omni.entrypoints.omni import OmniBase
from vllm_omni.entrypoints.omni_stage import OmniStage
from vllm_omni.entrypoints.stage_utils import SHUTDOWN_TASK, OmniStageTaskType
from vllm_omni.entrypoints.stage_utils import maybe_load_from_ipc as _load
from vllm_omni.entrypoints.utils import (
get_final_stage_id_for_e2e,
)
from vllm_omni.metrics import OrchestratorAggregator
from vllm_omni.outputs import OmniRequestOutput

logger = init_logger(__name__)
Expand All @@ -57,7 +56,6 @@ def _weak_close_cleanup_async(stage_list, stage_in_queues, ray_pg, output_handle
if output_handler is not None:
output_handler.cancel()


class AsyncOmni(OmniBase):
"""Asynchronous unified entry point supporting multi-stage pipelines for LLM and Diffusion models.

Expand Down Expand Up @@ -320,27 +318,27 @@ async def generate(self, *args: Any, **kwargs: dict[str, Any]) -> AsyncGenerator
)

# Metrics/aggregation helper
metrics = OrchestratorMetrics(
num_stages,
self._enable_stats,
_wall_start_ts,
metrics = OrchestratorAggregator(
num_stages=num_stages,
enable_stats=self._enable_stats,
wall_start_ts=_wall_start_ts, # will be reset at generate() time, just a placeholder here
)
# Seed stage-0 queue with all requests
logger.debug(f"[{self._name}] Seeding request into stage-0")
req_state = ClientRequestState(request_id)
req_state.metrics = metrics
self.request_states[request_id] = req_state

_req_start_ts[request_id] = time.time()
# Mark first input time for stage-0
metrics.stage_first_ts[0] = metrics.stage_first_ts[0] or time.time()

sp0: SamplingParams = sampling_params_list[0] # type: ignore[index]
task = {
"request_id": request_id,
"engine_inputs": prompt,
"sampling_params": sp0,
}
self.stage_list[0].submit(task)
_req_start_ts[request_id] = time.time()
logger.debug(f"[{self._name}] Enqueued request {request_id} to stage-0")

logger.debug(f"[{self._name}] Entering scheduling loop: stages={num_stages}")
Expand All @@ -366,6 +364,8 @@ async def generate(self, *args: Any, **kwargs: dict[str, Any]) -> AsyncGenerator
metrics.stage_last_ts[stage_id] = max(metrics.stage_last_ts[stage_id] or 0.0, time.time())
try:
_m = asdict(result.get("metrics"))
# stage_gen_time_ms is the time of generating every chunk in this stage
metrics.accumulated_gen_time_ms[req_id] += _m.get("stage_gen_time_ms", 0.0)
if _m is not None and finished:
metrics.on_stage_metrics(stage_id, req_id, _m)
except Exception as e:
Expand Down Expand Up @@ -423,7 +423,11 @@ async def generate(self, *args: Any, **kwargs: dict[str, Any]) -> AsyncGenerator
next_stage_id = stage_id + 1
if next_stage_id <= final_stage_id_for_e2e and finished:
next_stage: OmniStage = self.stage_list[next_stage_id]
# Derive inputs for the next stage, record preprocess time
_prep_t0 = time.perf_counter()
next_inputs = next_stage.process_engine_inputs(self.stage_list, prompt)
_prep_ms = (time.perf_counter() - _prep_t0) * 1000.0
metrics.record_stage_preprocess_time(next_stage_id, req_id, _prep_ms)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid dropping preprocess timing before stage stats exist

The preprocess time is recorded immediately after process_engine_inputs but before the next stage has produced any metrics. record_stage_preprocess_time only updates existing stage_events entries, so at this point there is no entry for next_stage_id, causing the value to be dropped and leaving preprocess_time_ms at 0 for all requests in async multi-stage runs. To make this metric usable, buffer it until on_stage_metrics creates the stage entry or move the recording to after metrics are emitted.

Useful? React with 👍 / 👎.

sp_next: SamplingParams = sampling_params_list[next_stage_id]

# Check if we have a connector for this edge
Expand Down Expand Up @@ -460,11 +464,9 @@ async def generate(self, *args: Any, **kwargs: dict[str, Any]) -> AsyncGenerator
logger.debug(f"[{self._name}] Request {req_id} fully completed")

logger.debug(f"[{self._name}] All requests completed")

# Summarize and print stats
try:
summary = metrics.build_and_log_summary(final_stage_id_for_e2e)
logger.info("[Summary] %s", pformat(summary, sort_dicts=False))
metrics.build_and_log_summary(final_stage_id_for_e2e)
except Exception as e:
logger.exception(f"[{self._name}] Failed to build/log summary: {e}")
finally:
Expand Down
4 changes: 2 additions & 2 deletions vllm_omni/entrypoints/client_request_state.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from vllm_omni.entrypoints.log_utils import OrchestratorMetrics
from vllm_omni.metrics import OrchestratorAggregator


class ClientRequestState:
Expand All @@ -10,4 +10,4 @@ def __init__(self, request_id: str, queue: asyncio.Queue | None = None):
self.request_id = request_id
self.stage_id: int | None = None
self.queue = queue if queue is not None else asyncio.Queue()
self.metrics: OrchestratorMetrics | None = None
self.metrics: OrchestratorAggregator | None = None
Loading
Loading