[LEADS-415] Responses streaming support by xmican10 · Pull Request #255 · lightspeed-core/lightspeed-evaluation

xmican10 · 2026-06-12T14:34:49Z

Description

Followup PR with supporting responses stream=True parameter

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Added OpenAI-style streaming support to the /responses endpoint for real-time delivery
- Improved streaming parsing, including tool-call extraction and file-search chunk mapping into the response payload
- Enhanced token timing metrics for better generation performance tracking
Bug Fixes
- Improved handling of tool-call argument shapes and stricter validation for missing/invalid fields
Tests
- Extended unit tests to verify correct streaming vs non-streaming request behavior and end-to-end streaming parsing (including error and [DONE] edge cases)

coderabbitai · 2026-06-12T14:34:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 222153c9-e025-49c1-9cf1-494fe9f662a3

📥 Commits

Reviewing files that changed from the base of the PR and between 9e9df01 and 5449b8f.

📒 Files selected for processing (4)

src/lightspeed_evaluation/core/api/client.py
src/lightspeed_evaluation/core/api/streaming_parser.py
tests/unit/core/api/test_client_responses.py
tests/unit/core/api/test_streaming_parser.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/unit/core/api/test_client_responses.py
src/lightspeed_evaluation/core/api/client.py
src/lightspeed_evaluation/core/api/streaming_parser.py

Walkthrough

The streaming parser is refactored from Pydantic-based models to dataclasses, with the monolithic event switch replaced by handler-dispatch tables (_STREAMING_EVENT_HANDLERS, _RESPONSES_EVENT_HANDLERS). A new parse_responses_streaming function and ResponsesStreamingContext dataclass handle the /responses SSE protocol. APIClient._responses_query gains a stream-gated branch using httpx.Client.stream. Unit tests cover both the streaming dispatch in APIClient and the new parser entrypoint.

Changes

Responses Endpoint Streaming Support

Layer / File(s)	Summary
Streaming parser: dataclass state and handler dispatch `src/lightspeed_evaluation/core/api/streaming_parser.py`	Replaces Pydantic with `dataclasses`, converts `CONTENT_EVENTS` to `frozenset`, refactors `_PerformanceTracker` and `StreamingContext` to dataclasses with token-per-second calculation excluding TTFT, replaces the centralized event switch with `_STREAMING_EVENT_HANDLERS` per-event dispatch for `/streaming` protocol, updates `_parse_tool_call` to support both `name`/`args` and legacy `tool_name`/`arguments` shapes with an optional `error` field, and splits validation into `_validate_streaming_response` and `_validate_responses_response`.
`parse_responses_streaming` and `/responses` SSE loop `src/lightspeed_evaluation/core/api/streaming_parser.py`	Adds `ResponsesStreamingContext` dataclass and `_RESPONSES_EVENT_HANDLERS` dispatch for `response.created`, `response.output_text.delta` (TTFT capture), `response.output_item.done` (MCP normalization and file-search-to-`rag_chunks` extraction), and `response.completed` (final response and token counts); implements `parse_responses_streaming` with `[DONE]`-terminated SSE loop; updates `parse_streaming_response` to use handler dispatch and raises `ValueError` on `error` events with tokens.
`APIClient` streaming dispatch for `/responses` `src/lightspeed_evaluation/core/api/client.py`	Imports `parse_responses_streaming`; adds a `stream`-gated branch in `_responses_query` that uses `httpx.Client.stream` when `responses_request.get("stream")` is truthy, calls `_handle_response_errors`, invokes `parse_responses_streaming`, and returns `APIResponse` from the parsed result; non-streaming path via `httpx.Client.post` remains unchanged.
Unit tests: client streaming dispatch and responses SSE parser `tests/unit/core/api/test_client_responses.py`, `tests/unit/core/api/test_streaming_parser.py`	Adds `test_responses_streaming_dispatches_to_parse_responses_streaming` and `test_responses_non_streaming_does_not_use_streaming_path` asserting correct dispatch in `TestResponsesEndpoint`; adds `TestNormalizeMcpItem` for JSON-string argument decoding; adds `TestParseResponsesStreaming` covering basic response parsing, usage tokens and TTFT timing, MCP tool-call extraction with argument decoding and error field capture, file-search-to-`rag_chunks` mapping with synthetic `file_search` tool call, missing final-response and `conversation_id` errors, and `[DONE]` sentinel ordering validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

lightspeed-core/lightspeed-evaluation#254: Introduces the /responses endpoint support in APIClient and the initial request/response mapping that this PR extends with streaming dispatch and parse_responses_streaming.

Suggested reviewers

asamal4

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main objective: adding streaming support for the responses endpoint, which aligns with the core changes across client.py, streaming_parser.py, and tests.
Docstring Coverage	✅ Passed	Docstring coverage is 98.15% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

asamal4 · 2026-06-17T00:35:24Z

@coderabbitai full review

coderabbitai · 2026-06-17T00:35:32Z

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/lightspeed_evaluation/core/api/streaming_parser.py (1)

356-358: 💤 Low value

Unreachable code: arguments can never be None after line 350.

Line 350 uses or {} as a fallback, so arguments will always be at least an empty dict. The condition if arguments is None on line 356 will never be true.

Consider removing this unreachable block:

Proposed fix

         if not tool_name:
             logger.debug("Tool call missing name/tool_name field")
             return None
 
-        if arguments is None:
-            logger.debug("Tool call missing args/arguments field for %s", tool_name)
-            return None
-
         tool_call: dict[str, Any] = {"tool_name": tool_name, "arguments": arguments}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/core/api/streaming_parser.py` around lines 356 -
358, Remove the unreachable `if arguments is None:` condition and its associated
debug logging block. Since the arguments variable is assigned with an `or {}`
fallback pattern earlier in the code, it will always contain at least an empty
dictionary and can never be None, making this entire conditional block dead code
that should be deleted.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/core/api/test_client_responses.py`:
- Line 1: Remove the module-level pylint disable comment at the top of
test_client_responses.py that suppresses protected-access and duplicate-code
warnings. Instead of disabling these checks globally, identify the specific code
locations that trigger these warnings (likely places where protected members are
accessed with underscore prefixes or where code is duplicated) and either
refactor the code to avoid the issue, or if necessary, apply targeted localized
lint suppressions directly to those specific lines. This ensures code quality
standards are maintained and warnings are addressed rather than hidden.

---

Nitpick comments:
In `@src/lightspeed_evaluation/core/api/streaming_parser.py`:
- Around line 356-358: Remove the unreachable `if arguments is None:` condition
and its associated debug logging block. Since the arguments variable is assigned
with an `or {}` fallback pattern earlier in the code, it will always contain at
least an empty dictionary and can never be None, making this entire conditional
block dead code that should be deleted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 00d2cc99-273a-4e6c-8365-056cd39844d9

📥 Commits

Reviewing files that changed from the base of the PR and between 454d871 and 5bb24c3.

📒 Files selected for processing (19)

config/system.yaml
examples/01_getting_started/basic_setup/README.md
examples/02_metrics/context_quality/README.md
examples/02_metrics/conversation_quality/README.md
examples/02_metrics/keywords_evaluation/README.md
examples/02_metrics/nlp_metrics/README.md
examples/02_metrics/response_quality/README.md
examples/02_metrics/tool_evaluation/README.md
examples/03_endpoints/responses/README.md
examples/03_endpoints/responses/eval_data.yaml
examples/03_endpoints/responses/system.yaml
src/lightspeed_evaluation/core/api/client.py
src/lightspeed_evaluation/core/api/streaming_parser.py
src/lightspeed_evaluation/core/constants.py
src/lightspeed_evaluation/core/metrics/custom/tool_eval.py
tests/unit/core/api/conftest.py
tests/unit/core/api/test_client_responses.py
tests/unit/core/api/test_streaming_parser.py
tests/unit/core/metrics/custom/test_tool_eval.py

xmican10 · 2026-06-17T07:40:43Z

I need to rebase..

xmican10 · 2026-06-17T09:18:59Z

@coderabbitai full review

asamal4

Thanks !! Some minor comments - applicable in multiples places

xmican10 force-pushed the LEADS-415-responses-streaming-support branch 3 times, most recently from e8a6dd9 to 5bb24c3 Compare June 12, 2026 14:55

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread tests/unit/core/api/test_client_responses.py

Responses endpoint support

131cd39

xmican10 force-pushed the LEADS-415-responses-streaming-support branch 2 times, most recently from 8c03464 to 54e60be Compare June 17, 2026 07:49

asamal4 reviewed Jun 17, 2026

View reviewed changes

Comment thread src/lightspeed_evaluation/core/api/streaming_parser.py Outdated

Comment thread src/lightspeed_evaluation/core/api/streaming_parser.py

Comment thread tests/unit/core/api/test_streaming_parser.py Outdated

xmican10 force-pushed the LEADS-415-responses-streaming-support branch from 54e60be to 9e9df01 Compare June 22, 2026 11:34

Adding streaming support for responses

5449b8f

xmican10 force-pushed the LEADS-415-responses-streaming-support branch from 9e9df01 to 5449b8f Compare June 22, 2026 12:51

asamal4 approved these changes Jun 23, 2026

View reviewed changes

asamal4 merged commit a625cab into lightspeed-core:main Jun 23, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LEADS-415] Responses streaming support#255

[LEADS-415] Responses streaming support#255
asamal4 merged 2 commits into
lightspeed-core:mainfrom
xmican10:LEADS-415-responses-streaming-support

xmican10 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

asamal4 commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

xmican10 commented Jun 17, 2026

Uh oh!

xmican10 commented Jun 17, 2026

Uh oh!

asamal4 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xmican10 commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

asamal4 commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xmican10 commented Jun 17, 2026

Uh oh!

xmican10 commented Jun 17, 2026

Uh oh!

asamal4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xmican10 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading