Add timing metadata for streaming `ParallelExecution` responses by gtopper · Pull Request #613 · mlrun/storey

gtopper · 2026-02-11T12:35:23Z

Streaming responses from ParallelExecution were missing the when and microsec timing metadata that non-streaming responses include. This metadata is required for model monitoring in MLRun.

Changes

Add _StreamingResult class to wrap streaming generators with timing info
Set timing metadata on events before emitting streaming chunks
Handle both in-process streaming (_StreamingResult) and process-based streaming (raw generators)
Collector calculates total streaming duration (microsec) when the stream completes

Notes

ParallelExecution sets microsec to None initially since total runtime isn't known until streaming completes
Collector calculates the actual microsec value (total elapsed time from stream start to completion)
For process-based streaming, when uses the timestamp when chunks start arriving

[ML-11879](https://iguazio.atlassian.net/browse/ML-11879) Streaming responses from `ParallelExecution` were missing the when and `microsec` timing metadata that non-streaming responses include. This metadata is required for model monitoring in MLRun. Changes * Add `_StreamingResult` class to wrap streaming generators with timing info * Set timing metadata on events before emitting streaming chunks * Handle both in-process streaming (`_StreamingResult`) and process-based streaming (raw generators) Notes For streaming, `microsec` is set to `None` since total runtime isn't available until streaming completes For process-based streaming, when uses the timestamp when chunks start arriving (timing from subprocess isn't available)

royischoss

LGTM

[ML-11879](https://iguazio.atlassian.net/browse/ML-11879) Enables model monitoring for streaming `ModelRunnerStep`s (MRS). When a serving function has streaming enabled, the MM pipeline needs to aggregate streaming chunks into a single event before processing. **Key changes:** * Bumps storey to get the changes in mlrun/storey#613 - **Collector insertion** (`server.py`): When `streaming=True`, dynamically inserts a `storey.Collector` step between each MRS and the MM pipeline to aggregate streaming chunks into a single event. - **Chunk aggregation** (`system_steps.py`): Adds aggregation logic to `MonitoringPreProcessor` that detects collected streaming chunks and merges them — concatenating string outputs, summing numeric metrics, and taking first values for other fields. - **ProcessEndpointEvent fixes** (`stream_processing.py`): Returns event with `body=None` instead of bare `None` on validation failure (fixes downstream `AttributeError`). Removes `microsec` `is_not_none` validation since streaming latency is calculated asynchronously by `Collector`. ## Tests * Unit tests for chunk detection and all aggregation paths * Graph structure test verifying `Collector` insertion when streaming is enabled * System test (`test_monitoring_with_streaming_model_runner`) verifying end-to-end: streaming model → `Collector` → MM pipeline → model endpoint creation * `test_app_flow` passed [ML-11879]: https://iguazio.atlassian.net/browse/ML-11879?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

gtopper changed the title ~~Add timing metadata for streaming ParallelExecution responses~~ Add timing metadata for streaming ParallelExecution responses Feb 12, 2026

gtopper added 2 commits February 12, 2026 13:47

Implement latency metadata in Collector

375cb12

Lint

7d187cf

gtopper requested a review from royischoss February 12, 2026 08:09

gtopper marked this pull request as ready for review February 12, 2026 08:12

gtopper mentioned this pull request Feb 12, 2026

[Model Monitoring] Add support for streaming MRS mlrun/mlrun#9278

Merged

gtopper added 3 commits February 12, 2026 16:03

Remove code that handles unsupported multi-model stream result

3bacea1

Add missing unit test for StreamingError

f130d88

Merge remote-tracking branch 'mlrun/development' into ML-11879

d56797a

royischoss approved these changes Feb 12, 2026

View reviewed changes

gtopper merged commit 784b03e into mlrun:development Feb 12, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timing metadata for streaming `ParallelExecution` responses#613

Add timing metadata for streaming `ParallelExecution` responses#613
gtopper merged 6 commits intomlrun:developmentfrom
gtopper:ML-11879

gtopper commented Feb 11, 2026 •

edited

Loading

Uh oh!

royischoss left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gtopper commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

royischoss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gtopper commented Feb 11, 2026 •

edited

Loading