Skip to content

Add timing metadata for streaming ParallelExecution responses#613

Merged
gtopper merged 6 commits intomlrun:developmentfrom
gtopper:ML-11879
Feb 12, 2026
Merged

Add timing metadata for streaming ParallelExecution responses#613
gtopper merged 6 commits intomlrun:developmentfrom
gtopper:ML-11879

Conversation

@gtopper
Copy link
Collaborator

@gtopper gtopper commented Feb 11, 2026

ML-11879

Streaming responses from ParallelExecution were missing the when and microsec timing metadata that non-streaming responses include. This metadata is required for model monitoring in MLRun.

Changes

  • Add _StreamingResult class to wrap streaming generators with timing info
  • Set timing metadata on events before emitting streaming chunks
  • Handle both in-process streaming (_StreamingResult) and process-based streaming (raw generators)
  • Collector calculates total streaming duration (microsec) when the stream completes

Notes

  • ParallelExecution sets microsec to None initially since total runtime isn't known until streaming completes
  • Collector calculates the actual microsec value (total elapsed time from stream start to completion)
  • For process-based streaming, when uses the timestamp when chunks start arriving

[ML-11879](https://iguazio.atlassian.net/browse/ML-11879)

Streaming responses from `ParallelExecution` were missing the when and `microsec` timing metadata that non-streaming responses include. This metadata is required for model monitoring in MLRun.

Changes
* Add `_StreamingResult` class to wrap streaming generators with timing info
* Set timing metadata on events before emitting streaming chunks
* Handle both in-process streaming (`_StreamingResult`) and process-based streaming (raw generators)

Notes
For streaming, `microsec` is set to `None` since total runtime isn't available until streaming completes
For process-based streaming, when uses the timestamp when chunks start arriving (timing from subprocess isn't available)
@gtopper gtopper changed the title Add timing metadata for streaming ParallelExecution responses Add timing metadata for streaming ParallelExecution responses Feb 12, 2026
@gtopper gtopper requested a review from royischoss February 12, 2026 08:09
@gtopper gtopper marked this pull request as ready for review February 12, 2026 08:12
Copy link
Collaborator

@royischoss royischoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gtopper gtopper merged commit 784b03e into mlrun:development Feb 12, 2026
5 checks passed
gtopper added a commit to mlrun/mlrun that referenced this pull request Feb 16, 2026
[ML-11879](https://iguazio.atlassian.net/browse/ML-11879)

Enables model monitoring for streaming `ModelRunnerStep`s (MRS). When a
serving function has streaming enabled, the MM pipeline needs to
aggregate streaming chunks into a single event before processing.

**Key changes:**
* Bumps storey to get the changes in
mlrun/storey#613
- **Collector insertion** (`server.py`): When `streaming=True`,
dynamically inserts a `storey.Collector` step between each MRS and the
MM pipeline to aggregate streaming chunks into a single event.
- **Chunk aggregation** (`system_steps.py`): Adds aggregation logic to
`MonitoringPreProcessor` that detects collected streaming chunks and
merges them — concatenating string outputs, summing numeric metrics, and
taking first values for other fields.
- **ProcessEndpointEvent fixes** (`stream_processing.py`): Returns event
with `body=None` instead of bare `None` on validation failure (fixes
downstream `AttributeError`). Removes `microsec` `is_not_none`
validation since streaming latency is calculated asynchronously by
`Collector`.

## Tests
* Unit tests for chunk detection and all aggregation paths
* Graph structure test verifying `Collector` insertion when streaming is
enabled
* System test (`test_monitoring_with_streaming_model_runner`) verifying
end-to-end: streaming model → `Collector` → MM pipeline → model endpoint
creation
* `test_app_flow` passed

[ML-11879]:
https://iguazio.atlassian.net/browse/ML-11879?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants