Skip to content

Add support for stream responses#605

Merged
gtopper merged 32 commits intomlrun:developmentfrom
gtopper:ML-11875
Jan 26, 2026
Merged

Add support for stream responses#605
gtopper merged 32 commits intomlrun:developmentfrom
gtopper:ML-11875

Conversation

@gtopper
Copy link
Collaborator

@gtopper gtopper commented Jan 20, 2026

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive streaming response support to the storey data processing library, enabling steps to yield multiple chunks of data incrementally rather than returning a single result. The implementation includes new primitive types for streaming, modifications to existing flow steps, a new Collector step for aggregating streams, and extensive test coverage.

Changes:

  • Added streaming primitives (StreamChunk, StreamCompletion, StreamingError) and modified Map, MapClass, Complete, Reduce, and ParallelExecution steps to support generator functions
  • Introduced Collector step to aggregate streaming chunks back into single events
  • Updated AwaitableResult and AsyncAwaitableResult to return generators for streaming responses

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
storey/dtypes.py Adds StreamChunk, StreamCompletion, and StreamingError classes for streaming support
storey/flow.py Adds _StreamingStepMixin and updates Map, MapClass, Complete, Reduce, ParallelExecution, and Choice to handle streaming
storey/sources.py Updates AwaitableResult and AsyncAwaitableResult to support streaming generators
storey/steps/collector.py Implements new Collector step for aggregating streaming chunks
storey/steps/init.py Exports the new Collector step
storey/init.py Exports StreamingError and Collector for public API
tests/test_streaming.py Comprehensive test suite covering streaming primitives, Map/MapClass streaming, Collector, Complete, error handling, graph splits, and cyclic graphs
tests/test_flow.py Refactors cycle creation to use cleaner .to() API instead of direct _outlets manipulation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@alxtkr77 alxtkr77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/test_streaming.py - Test Duplication

The 56 tests have significant structural duplication - almost every test has both a sync and async version with ~90% identical code (27 pairs).

Consider using pytest parametrization to consolidate:

@pytest.fixture(params=["sync", "async"])
def flow_context(request):
    if request.param == "sync":
        return SyncEmitSource, lambda f: f()
    else:
        return AsyncEmitSource, lambda f: asyncio.run(f())

def test_collector_basic(self, flow_context):
    source_cls, run = flow_context
    # Single implementation handles both

This would reduce tests from 56 to ~30 while maintaining the same coverage, and make future maintenance easier.

Copy link
Member

@alxtkr77 alxtkr77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flow.py - Duplicate _is_generator method

The _is_generator() method is defined identically in two places:

  • _StreamingStepMixin (line ~201)
  • ParallelExecutionRunnable (line ~1448)

Consider reusing the mixin method or extracting to a shared utility to avoid duplication.

@gtopper
Copy link
Collaborator Author

gtopper commented Jan 22, 2026

tests/test_streaming.py - Test Duplication

The 56 tests have significant structural duplication - almost every test has both a sync and async version with ~90% identical code (27 pairs).

Consider using pytest parametrization to consolidate:

@pytest.fixture(params=["sync", "async"])
def flow_context(request):
    if request.param == "sync":
        return SyncEmitSource, lambda f: f()
    else:
        return AsyncEmitSource, lambda f: asyncio.run(f())

def test_collector_basic(self, flow_context):
    source_cls, run = flow_context
    # Single implementation handles both

This would reduce tests from 56 to ~30 while maintaining the same coverage, and make future maintenance easier.

Yeah, this is sort of true. My AI also wanted to do this. The problem is, the APIs are too divergent, and the result of parameterization isn't very good. I.e. the above code snippet doesn't cover all the conditionals needed.

@gtopper gtopper requested a review from alxtkr77 January 22, 2026 12:24
@gtopper gtopper marked this pull request as ready for review January 22, 2026 12:42
@gtopper gtopper requested a review from royischoss January 26, 2026 10:32
Copy link
Collaborator

@royischoss royischoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey LGTM two minor comments

@gtopper gtopper requested a review from royischoss January 26, 2026 12:11
Copy link
Collaborator

@royischoss royischoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@gtopper gtopper merged commit e7ded66 into mlrun:development Jan 26, 2026
5 checks passed
gtopper added a commit to gtopper/mlrun that referenced this pull request Jan 26, 2026
Using the new functionality introduced in storey 1.11.8 / mlrun/storey#605.

[ML-11876](https://iguazio.atlassian.net/browse/ML-11876)
gtopper added a commit to gtopper/mlrun that referenced this pull request Jan 26, 2026
Using the new functionality introduced in storey 1.11.8 / mlrun/storey#605.

[ML-11876](https://iguazio.atlassian.net/browse/ML-11876)
gtopper added a commit to gtopper/mlrun that referenced this pull request Jan 26, 2026
Using the new functionality introduced in storey 1.11.8 / mlrun/storey#605.

[ML-11876](https://iguazio.atlassian.net/browse/ML-11876)
gtopper added a commit to gtopper/mlrun that referenced this pull request Jan 26, 2026
Using the new functionality introduced in storey 1.11.8 / mlrun/storey#605.

[ML-11876](https://iguazio.atlassian.net/browse/ML-11876)
gtopper added a commit to gtopper/mlrun that referenced this pull request Jan 26, 2026
Using the new functionality introduced in storey 1.11.8 / mlrun/storey#605.

[ML-11876](https://iguazio.atlassian.net/browse/ML-11876)
gtopper added a commit to gtopper/storey that referenced this pull request Jan 26, 2026
mlrun#605 broke mlrun test `serving.test_async_flow.test_model_runner_with_selector`.
gtopper added a commit that referenced this pull request Jan 26, 2026
#605 broke mlrun test `serving.test_async_flow.test_model_runner_with_selector`.
gtopper added a commit to gtopper/mlrun that referenced this pull request Jan 26, 2026
To fix `serving.test_async_flow.test_model_runner_with_selector` following breakage introduced in storey 1.11.8 / mlrun/storey#605.
gtopper added a commit to mlrun/mlrun that referenced this pull request Jan 27, 2026
To fix `serving.test_async_flow.test_model_runner_with_selector`
following breakage introduced in storey 1.11.8 /
mlrun/storey#605.
gtopper added a commit to gtopper/mlrun that referenced this pull request Jan 27, 2026
Using the new functionality introduced in storey 1.11.8 / mlrun/storey#605.

[ML-11876](https://iguazio.atlassian.net/browse/ML-11876)
gtopper added a commit to mlrun/mlrun that referenced this pull request Jan 29, 2026
Adds support for streaming responses in serving graphs, enabling
real-time chunk-by-chunk HTTP responses (e.g., for LLM token streaming).

Key changes:
* New `set_streaming(enabled=True)` API on serving functions
* Async streaming handler that yields results as they're produced by
graph steps
* Graph steps can now use generators to stream multiple chunks
* Updated nuclio handler for generator return type support

Using the new functionality introduced in storey 1.11.8 /
mlrun/storey#605.

[ML-11876](https://iguazio.atlassian.net/browse/ML-11876)

Depends on nuclio/nuclio-jupyter#197.

[ML-11876]:
https://iguazio.atlassian.net/browse/ML-11876?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants