WIP: streaming + claude reasoning fix by leonardmq · Pull Request #1073 · Kiln-AI/Kiln

leonardmq · 2026-02-20T10:21:31Z

What does this PR do?

Changes:

add streaming proof of concept
add thinking_level on Claude models
add flags in LiteLLM adapter to enable thinking for Claude models (OpenRouter requires a different config than Anthropic; they conflict)
integration test verifying the streaming events and response completion

Checklists

Tests have been run locally and passed
New tests have been added to any work in /lib

Summary by CodeRabbit

New Features
- Real-time streaming support across model adapters with chunk callback hooks and viewing streamed responses as they arrive
- New "thinking" model variants for Claude family enabling enhanced reasoning modes
Tests
- Comprehensive streaming tests and fixtures validating chunk streaming, final response assembly, re-iteration, and edge cases
Dependencies
- Updated underlying model client library to a newer compatible version

gemini-code-assist · 2026-02-20T10:21:57Z

Summary of Changes

Hello @leonardmq, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the system's interaction with Claude models by enabling and improving reasoning capabilities and introducing a robust streaming mechanism. The changes allow for real-time processing of model outputs and ensure that reasoning steps are properly captured, particularly for models accessed via LiteLLM. This significantly improves the user experience for long-running model calls and provides more detailed insights into the model's thought process.

Highlights

Streaming Proof of Concept: Introduced a new StreamingCompletion class to provide an async iterable wrapper around litellm.acompletion for handling streaming responses, allowing chunks to be processed as they arrive and the final assembled response to be accessed after iteration.
Claude Model Reasoning Capabilities: Added reasoning_capable=True and thinking_level='medium' flags to various Claude models (Haiku, Sonnet, Opus) across both Anthropic and OpenRouter providers in the ml_model_list.
LiteLLM Adapter Enhancements: Modified the LiteLLM adapter to integrate the new streaming functionality, passing an on_chunk callback for real-time processing of streamed data. Also implemented specific logic for OpenRouter to correctly handle reasoning_effort parameters.
Comprehensive Integration Tests: Added new integration tests to verify the functionality of streaming events and response completion for Claude models, ensuring that reasoning content, tool calls, and final output are correctly captured and processed during streaming.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py
- Added StreamingCompletion class for async iteration over LiteLLM streaming responses.
libs/core/kiln_ai/adapters/litellm_utils/test_litellm_streaming.py
- Added unit tests for StreamingCompletion covering chunk yielding, response handling, and argument forwarding.
libs/core/kiln_ai/adapters/ml_model_list.py
- Updated Claude model definitions to include reasoning_capable=True and thinking_level='medium' for various models.
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
- Introduced StreamCallback type for streaming functionality.
- Modified invoke, _run_returning_run_output, and _run methods to accept an on_chunk parameter for streaming.
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
- Refactored acompletion_checking_response to use StreamingCompletion for handling streaming calls.
- Removed direct litellm import, now using StreamingCompletion.
- Implemented logic to pass on_chunk callback to the streaming process.
- Added conditional logic to build_extra_body to include allowed_openai_params for OpenRouter when thinking_level is set.
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
- Updated MockAdapter._run and several test methods to accept **kwargs to support the new on_chunk parameter.
- Added new test cases for TestStreamCallback to verify on_chunk forwarding and default behavior.
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
- Updated mock litellm.acompletion calls to use patch.object on LiteLlmAdapter.acompletion_checking_response for consistency with streaming changes.
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py
- Added new file with extensive integration tests for LiteLLM streaming, including ChunkRenderer and ChunkRawRenderer for verifying streamed content, reasoning, and tool calls across various Claude models.
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_tools.py
- Modified mock litellm.acompletion calls to use patch.object on LiteLlmAdapter.acompletion_checking_response.
- Added a test to ensure the on_chunk handler functions correctly end-to-end with real models.
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
- Updated MockAdapter._run method signature to accept **kwargs.
libs/core/kiln_ai/adapters/model_adapters/test_structured_output.py
- Updated MockAdapter._run method signature to accept **kwargs.
- Modified mock litellm.acompletion calls to use patch.object on LiteLlmAdapter.acompletion_checking_response.
libs/core/kiln_ai/adapters/test_prompt_adaptors.py
- Updated mock litellm.acompletion calls to use patch.object on LiteLlmAdapter.acompletion_checking_response.
libs/core/kiln_ai/adapters/test_prompt_builders.py
- Updated MockAdapter._run method signature to accept **kwargs.
libs/core/kiln_ai/datamodel/test_basemodel.py
- Updated MockAdapter._run method signature to accept **kwargs.
libs/core/pyproject.toml
- Updated litellm dependency to >=1.81.13.
- Updated typer-slim dependency to >=0.21.2.
uv.lock
- Updated lock file entries for annotated-doc, huggingface-hub, importlib-metadata, litellm, and tokenizers to newer versions.
- Updated typer-slim entry to 0.21.2 and added its new dependency on annotated-doc.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-20T10:22:01Z

Important

Review skipped

Ignore keyword(s) in the title.

⛔ Ignored keywords (2)

WIP
Draft

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a9d99f24-84f3-4e13-9c0e-3a8f1e56aad8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Adds an async StreamingCompletion wrapper and threads an on_chunk StreamCallback through adapter layers to support litellm streaming; updates litellm dependency, model metadata, and extensive tests to validate chunking, callback propagation, and final response assembly.

Changes

Cohort / File(s)	Summary
Streaming Core `libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py`	New `StreamingCompletion` async iterator that calls `litellm.acompletion(stream=True, ...)`, yields `ModelResponseStream` chunks, collects chunks, builds final response via `litellm.stream_chunk_builder`, and exposes `.response` after iteration.
Streaming Unit Tests `libs/core/kiln_ai/adapters/litellm_utils/test_litellm_streaming.py`	Tests for chunk ordering, response finalization, premature `.response` access error, stream kwarg normalization, arg passthrough, builder invocation, re-iteration reset, and empty-stream behavior.
Adapter Interfaces `libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`	Adds `StreamCallback` type alias and `on_chunk` parameter to `invoke`, `invoke_returning_run_output`, `_run_returning_run_output`, and abstract `_run` signature to accept chunk callbacks.
LiteLlmAdapter Streaming `libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py`	Integrates `StreamingCompletion`, propagates `on_chunk` through `_run_model_turn` and `_run`, invokes callback per chunk, and obtains final response from the stream wrapper; adds OpenRouter-specific allowed params handling.
MCP Adapter Streaming `libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py`	Threads `on_chunk` through MCPAdapter public/private methods and updates imports to expose `StreamCallback`.
Tests: Adapter Mocks Updated `libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py`, `.../test_saving_adapter_results.py`, `.../test_prompt_builders.py`, `libs/core/kiln_ai/datamodel/test_basemodel.py`	Updated test MockAdapter `_run` signatures to accept `**kwargs` and added tests (TestStreamCallback) to assert `on_chunk` propagation.
Litellm Adapter Tests Updated / Added `libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py`, `test_litellm_adapter_tools.py`, `test_structured_output.py`, `test_prompt_adaptors.py`	Replaced direct `litellm.acompletion` patches with `LiteLlmAdapter.acompletion_checking_response` mocking and added streaming-aware assertions; introduced new streaming test module with chunk renderers.
Streaming Integration Tests `libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py`	New comprehensive streaming integration tests with `ChunkRendererAbstract`, `ChunkRenderer`, `ChunkRawRenderer`, fixtures, and multiple streaming validation scenarios across providers.
Model List Updates `libs/core/kiln_ai/adapters/ml_model_list.py`	Adds multiple Claude "thinking" model enum entries and KilnModel configurations with `thinking_level="medium"` / related provider entries.
Dependency `libs/core/pyproject.toml`	Bumps `litellm` dependency constraint from `>=1.80.9` to `>=1.81.13`.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant BaseAdapter as BaseAdapter
    participant LiteLlmAdapter as LiteLlmAdapter
    participant StreamingCompletion as StreamingCompletion
    participant litellm as litellm
    participant on_chunk as on_chunk Callback

    Client->>BaseAdapter: invoke(input, on_chunk=callback)
    BaseAdapter->>LiteLlmAdapter: _run(input, on_chunk=callback)
    LiteLlmAdapter->>LiteLlmAdapter: _run_model_turn(on_chunk=callback)
    LiteLlmAdapter->>StreamingCompletion: __aiter__() / create wrapper
    StreamingCompletion->>litellm: acompletion(..., stream=True)
    litellm-->>StreamingCompletion: async iterator of chunks
    loop For each chunk
        StreamingCompletion->>StreamingCompletion: collect chunk
        StreamingCompletion-->>LiteLlmAdapter: yield chunk
        LiteLlmAdapter->>on_chunk: await callback(chunk)
        on_chunk-->>LiteLlmAdapter: callback awaited
    end
    StreamingCompletion->>litellm: stream_chunk_builder(collected_chunks)
    litellm-->>StreamingCompletion: final assembled response
    StreamingCompletion-->>LiteLlmAdapter: expose .response
    LiteLlmAdapter-->>BaseAdapter: return RunOutput with final response
    BaseAdapter-->>Client: complete TaskRun

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

ml_model_list update: Add MiniMax M2 and Kimi K2 Thinking #782 — modifies ml_model_list.py to add thinking-enabled model entries (strong overlap in model metadata changes).
Adding haiku 4.5, and nominating sonnet 4.5 as recommended #730 — adds Claude model variants and provider entries in ml_model_list.py (overlapping enum/provider additions).
[MCP RunConfig] Pt 1: Add MCP support to RunConfig #1006 — touches MCP adapter surfaces; related to threading on_chunk/streaming callback through MCPAdapter.

Suggested labels

codex

Suggested reviewers

scosman
sfierro

"I'm a rabbit in the code-laden glen,
Hopping bytes and streaming again,
Chunks arrive like carrots in line,
Callbacks nibble each tasty time,
Final response—hooray!—now it's mine." 🐇✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.47% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'WIP: streaming + claude reasoning fix' is partially related to the changeset but marked as WIP and lacks specificity about the main changes.	Replace 'WIP: streaming + claude reasoning fix' with a more specific, non-draft title that clearly describes the main contribution, such as 'Add streaming support and Claude thinking models configuration'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description covers key changes and includes completed checklists, but lacks details on implementation approach and related issues section is empty.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch leonard/kil-420-adapter-add-streaming

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This PR introduces streaming support for litellm completions and fixes an issue with Claude models missing reasoning. The changes are well-implemented, with a new StreamingCompletion wrapper to handle streaming logic cleanly. The on_chunk callback is plumbed through the adapter stack correctly. The fix for Claude models by adding allowed_openai_params for OpenRouter is a good, well-commented solution for a provider-specific issue. The addition of unit tests for the new streaming utility and extensive integration tests for various models demonstrates thoroughness. I've left one minor suggestion in a test helper to make it more robust. Overall, this is a great PR.

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (1)
292-310: ⚠️ Potential issue | 🟠 Major

Always-streaming path creates a silent usage tracking regression for non-Claude providers.

acompletion_checking_response now unconditionally routes all completions through StreamingCompletion, whereas it previously used non-streaming litellm.acompletion directly. Before this change, response._hidden_params["response_cost"] and response.usage were reliably populated for all providers. Now, usage data depends on whether each provider includes it in streaming chunks — providers that only include usage in non-streaming responses, or emit it in ways stream_chunk_builder doesn't correctly reassemble, will silently return Usage() with all None fields.

This is an intentional change to support Claude extended thinking (which requires streaming), but it affects all providers. Usage tracking is not tested with streaming responses, creating a gap between the feature change and test coverage.

Recommended actions:

Either preserve the non-streaming path when on_chunk is None:
if on_chunk is not None:
    stream = StreamingCompletion(**kwargs)
    async for chunk in stream:
        await on_chunk(chunk)
    response = stream.response
else:
    import litellm
    response = await litellm.acompletion(**kwargs)
Or add an explicit test verifying usage_from_response correctly populates tokens and cost from a streamed response for at least one non-Claude provider (e.g., OpenAI GPT-4).

The build_extra_body thinking_level fix (reasoning_effort + allowed_openai_params) is correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py` around lines
292 - 310, The change in acompletion_checking_response routes all completions
through StreamingCompletion causing usage/_hidden_params["response_cost"] to be
missing for providers that only populate usage on the non-streaming path;
restore the previous non-streaming behavior when no chunk handler is provided or
add tests to ensure streaming reconstructs usage. Update
acompletion_checking_response so that if on_chunk is None it calls
litellm.acompletion(**kwargs) and assigns that to response, otherwise use
StreamingCompletion(**kwargs) and iterate chunks; alternatively add a test that
verifies usage_from_response (or response._hidden_params["response_cost"]) is
correctly populated for a non-Claude provider (e.g., OpenAI GPT-4) when using
StreamingCompletion/stream_chunk_builder. Ensure references to
StreamingCompletion, litellm.acompletion, acompletion_checking_response, and
usage_from_response are used to locate/edit the code and tests.

🧹 Nitpick comments (2)

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)

7-8: litellm dependency introduced in BaseAdapter.

from litellm.types.utils import ModelResponseStream couples the abstract base class (and all its non-litellm subclasses) to the litellm package. Consider moving StreamCallback to a standalone streaming_types.py module (using Any or a protocol) so non-litellm adapters don't carry a transitive litellm dependency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py` around lines 7 -
8, BaseAdapter currently imports ModelResponseStream from litellm which forces a
transitive litellm dependency; create a new standalone module streaming_types.py
that defines a StreamCallback type (either as typing.Any or a lightweight
typing.Protocol matching ModelResponseStream’s public API) and export any
minimal streaming types there, then update
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py to import
StreamCallback from streaming_types instead of ModelResponseStream and update
any references in BaseAdapter to use StreamCallback; leave direct litellm
imports only in adapters that actually need litellm-specific ModelResponseStream
and adjust their imports accordingly.

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py (1)

77-92: Double getattr in render_chunk for reasoning_content.

Lines 85-87 call getattr(chunk.choices[0].delta, "reasoning_content", None) twice — once in the elif condition and once to assign text. The second call is redundant.

♻️ Proposed simplification

-            elif getattr(chunk.choices[0].delta, "reasoning_content", None) is not None:
-                text = getattr(chunk.choices[0].delta, "reasoning_content", None)
-                if text is not None:
-                    self.render_reasoning(text)
+            elif (text := getattr(chunk.choices[0].delta, "reasoning_content", None)) is not None:
+                self.render_reasoning(text)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py`
around lines 77 - 92, In render_chunk, avoid calling
getattr(chunk.choices[0].delta, "reasoning_content", None) twice: first evaluate
and store it in a local variable (e.g., reasoning_text) then use that variable
both for the truthy check and for passing to render_reasoning; update the branch
under render_chunk where it currently has the two getattr calls so the condition
checks the stored reasoning_text and the subsequent call passes that same
variable to render_reasoning.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/core/kiln_ai/adapters/ml_model_list.py`:
- Line 1348: The provider entries that set thinking_level (e.g., the
dict/constructor call containing thinking_level="medium") must also set
reasoning_capable=True so Claude providers use single-turn reasoning and extract
"thinking" output; update every provider config that currently only sets
thinking_level (lines referenced include the occurrences at
thinking_level="medium" and the other positions) to include
reasoning_capable=True. For Anthropic providers where applicable, instead add or
also set anthropic_extended_thinking=True. Locate the provider definitions in
ml_model_list.py (the entries that include thinking_level) and add the
appropriate boolean flag to those same dict/constructor calls.

---

Outside diff comments:
In `@libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py`:
- Around line 292-310: The change in acompletion_checking_response routes all
completions through StreamingCompletion causing
usage/_hidden_params["response_cost"] to be missing for providers that only
populate usage on the non-streaming path; restore the previous non-streaming
behavior when no chunk handler is provided or add tests to ensure streaming
reconstructs usage. Update acompletion_checking_response so that if on_chunk is
None it calls litellm.acompletion(**kwargs) and assigns that to response,
otherwise use StreamingCompletion(**kwargs) and iterate chunks; alternatively
add a test that verifies usage_from_response (or
response._hidden_params["response_cost"]) is correctly populated for a
non-Claude provider (e.g., OpenAI GPT-4) when using
StreamingCompletion/stream_chunk_builder. Ensure references to
StreamingCompletion, litellm.acompletion, acompletion_checking_response, and
usage_from_response are used to locate/edit the code and tests.

---

Nitpick comments:
In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`:
- Around line 7-8: BaseAdapter currently imports ModelResponseStream from
litellm which forces a transitive litellm dependency; create a new standalone
module streaming_types.py that defines a StreamCallback type (either as
typing.Any or a lightweight typing.Protocol matching ModelResponseStream’s
public API) and export any minimal streaming types there, then update
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py to import
StreamCallback from streaming_types instead of ModelResponseStream and update
any references in BaseAdapter to use StreamCallback; leave direct litellm
imports only in adapters that actually need litellm-specific ModelResponseStream
and adjust their imports accordingly.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py`:
- Around line 77-92: In render_chunk, avoid calling
getattr(chunk.choices[0].delta, "reasoning_content", None) twice: first evaluate
and store it in a local variable (e.g., reasoning_text) then use that variable
both for the truthy check and for passing to render_reasoning; update the branch
under render_chunk where it currently has the two getattr calls so the condition
checks the stored reasoning_text and the subsequent call passes that same
variable to render_reasoning.

libs/core/kiln_ai/adapters/ml_model_list.py

github-actions · 2026-02-20T10:40:07Z

📊 Coverage Report

Overall Coverage: 91%

Diff: origin/main...HEAD

libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py (100%)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (100%)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (33.3%): Missing lines 301-305,422
libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py (80.0%): Missing lines 86

Summary

Total: 48 lines
Missing: 7 lines
Coverage: 85%

Line-by-line

View line-by-line diff coverage

libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py

Lines 297-309

  297 
  298     async def acompletion_checking_response(
  299         self, on_chunk: StreamCallback | None = None, **kwargs
  300     ) -> Tuple[ModelResponse, Choices]:
! 301         stream = StreamingCompletion(**kwargs)
! 302         async for chunk in stream:
! 303             if on_chunk is not None:
! 304                 await on_chunk(chunk)
! 305         response = stream.response
  306 
  307         if (
  308             not isinstance(response, ModelResponse)
  309             or not response.choices

Lines 418-426

  418 
  419             # anthropic does not need allowed_openai_params, and we get an error if we pass it in
  420             # but openrouter for example does need it or throws an error
  421             if provider.name == ModelProviderName.openrouter:
! 422                 extra_body["allowed_openai_params"] = ["reasoning_effort"]
  423 
  424         if provider.require_openrouter_reasoning:
  425             # https://openrouter.ai/docs/use-cases/reasoning-tokens
  426             extra_body["reasoning"] = {

libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py

Lines 82-90

  82         input: InputType,
  83         input_source: DataSource | None = None,
  84         on_chunk: StreamCallback | None = None,
  85     ) -> TaskRun:
! 86         run_output, _ = await self.invoke_returning_run_output(
  87             input, input_source, on_chunk=on_chunk
  88         )
  89         return run_output

📊 HTML Coverage Report - Interactive coverage report
📈 Diff Coverage Report - Detailed diff analysis
Github Actions Run - View the full coverage report

leonardmq · 2026-02-20T11:31:21Z

Chunk outputs from the test:
raw_chunks_Blazing Penguin.json
raw_chunks_Brave Comet.json
raw_chunks_Clever Owl.json
raw_chunks_Divine Wolf.json
raw_chunks_Fabled Dragon.json
raw_chunks_Fierce Zenith.json
raw_chunks_Jolly Echo.json
raw_chunks_Opal Raven.json
raw_chunks_Peaceful Horizon.json
raw_chunks_Peaceful Ocean.json
raw_chunks_Peaceful Sphinx.json

… mutating existing ones

leonardmq · 2026-02-20T14:29:06Z

@coderabbitai review

coderabbitai · 2026-02-20T14:29:18Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (9)

libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py (1)

30-36: Re-iterating the same instance silently makes a duplicate API call.

Every async for chunk in stream: invocation calls __aiter__, which resets _response/_iterated and issues a fresh litellm.acompletion. Since this is intended as a single-use wrapper (matching the pattern in acompletion_checking_response), this should either be documented in the class docstring or enforced with a guard:
def __aiter__(self):
    if self._iterated:
        raise RuntimeError("StreamingCompletion has already been iterated; create a new instance.")
    ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py` around lines
30 - 36, The wrapper silently allows re-iteration causing duplicate
litellm.acompletion calls; update the StreamingCompletion iterator logic (the
__aiter__ method) to enforce single-use by checking self._iterated and raising a
RuntimeError("StreamingCompletion has already been iterated; create a new
instance.") if True, and only call litellm.acompletion and set
self._response/_iterated on the first iteration; alternatively add this
single-use behavior note to the class docstring for clarity.

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py (4)

76-91: render_chunk assumes chunk.choices is non-empty.

Line 77 accesses chunk.choices[0] without a guard. While litellm streaming chunks should always contain at least one choice, a defensive check would prevent a confusing IndexError if an unexpected chunk shape arrives.

Suggested guard

     async def render_chunk(self, chunk: litellm.ModelResponseStream):
+        if not chunk.choices:
+            return
         if chunk.choices[0].finish_reason is not None:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py`
around lines 76 - 91, The render_chunk function assumes chunk.choices[0] exists
and can raise IndexError for empty choices; add a defensive guard at the top of
render_chunk (in the render_chunk method) that checks if not chunk.choices (or
len(chunk.choices) == 0) and handle that case (e.g., call
self.render_unknown(chunk) or return) before referencing chunk.choices[0]; then
proceed with the existing logic for tool_calls, reasoning_content, content, and
finish_reason.

127-153: StructuredOutputMode.unknown is fragile — prefer default.

Line 139 uses StructuredOutputMode.unknown, which raises ValueError("Structured output mode is unknown.") in response_format_options(). It currently works only because the task has no output schema, so has_structured_output() short-circuits. If anyone later adds an output schema to the task fixture, these tests will fail with an opaque error. Using StructuredOutputMode.default would be safer and more representative of real usage.

Suggested fix

-                    structured_output_mode=StructuredOutputMode.unknown,
+                    structured_output_mode=StructuredOutputMode.default,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py`
around lines 127 - 153, The test fixture uses StructuredOutputMode.unknown which
can later raise ValueError in response_format_options() if the task gains an
output schema; update the adapter_factory fixture to use
StructuredOutputMode.default instead of .unknown when constructing the
LiteLlmAdapter’s LiteLlmConfig/KilnAgentRunConfigProperties so
has_structured_output() won’t short-circuit and tests remain stable (look for
adapter_factory, LiteLlmAdapter, LiteLlmConfig, KilnAgentRunConfigProperties,
response_format_options).

97-107: Dead current_block_type field in ChunkRawRenderer.

Line 100 initializes self.current_block_type but it's never read or written afterwards. Looks like a copy-paste artifact from ChunkRenderer.

Fix

 class ChunkRawRenderer(ChunkRendererAbstract):
     def __init__(self):
         self.chunks: list[litellm.ModelResponseStream] = []
-        self.current_block_type: str | None = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py`
around lines 97 - 107, The ChunkRawRenderer class defines an unused field
current_block_type (set in __init__) that is never read or mutated—remove this
dead field to clean up the class; update the __init__ of ChunkRawRenderer to
only initialize self.chunks and leave render_chunk and get_stream_text unchanged
(methods: ChunkRawRenderer.__init__, ChunkRawRenderer.render_chunk,
ChunkRawRenderer.get_stream_text).

156-172: Duplicated parametrize lists across 4 tests.

The same 11-entry model/provider list is copy-pasted in all four @pytest.mark.parametrize decorators. Extract it to a module-level constant to reduce maintenance burden and risk of drift.

Suggested approach

+STREAMING_TEST_MODELS = [
+    ("claude_sonnet_4_5_thinking", ModelProviderName.openrouter),
+    ("claude_sonnet_4_5_thinking", ModelProviderName.anthropic),
+    # ... all entries ...
+    ("minimax_m2_5", ModelProviderName.openrouter),
+]
+
 `@pytest.mark.paid`
-@pytest.mark.parametrize(
-    "model_id,provider_name",
-    [
-        ("claude_sonnet_4_5_thinking", ModelProviderName.openrouter),
-        ...
-    ],
-)
+@pytest.mark.parametrize("model_id,provider_name", STREAMING_TEST_MODELS)
 async def test_acompletion_streaming_response(...):

Also applies to: 271-287, 360-376, 389-405

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py`
around lines 156 - 172, Extract the repeated 11-entry list used in the
pytest.mark.parametrize(...) decorators into a single module-level constant
(e.g., SUPPORTED_MODEL_PROVIDER_PAIRS) and replace each duplicated inline list
with that constant in the four tests that parametrize "model_id,provider_name";
keep the parameter names ("model_id", "provider_name") and the
pytest.mark.parametrize call but pass SUPPORTED_MODEL_PROVIDER_PAIRS as the
second argument to avoid copy-paste drift.

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_tools.py (1)

494-511: Dead litellm.acompletion patches remain in _run_model_turn tests.

In test_run_model_turn_parallel_tools, test_run_model_turn_sequential_tools, and test_run_model_turn_max_tool_calls_exceeded, the patch("litellm.acompletion", ...) is now unreachable because acompletion_checking_response (which is also patched) is what would call it. These patches are harmless but misleading dead code.

Example cleanup for test_run_model_turn_parallel_tools

     with patch.object(
         litellm_adapter, "cached_available_tools", return_value=[multiply_spy, add_spy]
     ):
-        with patch(
-            "litellm.acompletion",
-            side_effect=[mock_response, final_response],
-        ):
-            with patch.object(
-                litellm_adapter, "build_completion_kwargs", return_value={}
-            ):
-                with patch.object(
-                    litellm_adapter,
-                    "acompletion_checking_response",
-                    side_effect=[
-                        (mock_response, mock_response.choices[0]),
-                        (final_response, final_response.choices[0]),
-                    ],
-                ):
-                    result = await litellm_adapter._run_model_turn(
-                        provider, prior_messages, None, False
-                    )
+        with patch.object(
+            litellm_adapter, "build_completion_kwargs", return_value={}
+        ):
+            with patch.object(
+                litellm_adapter,
+                "acompletion_checking_response",
+                side_effect=[
+                    (mock_response, mock_response.choices[0]),
+                    (final_response, final_response.choices[0]),
+                ],
+            ):
+                result = await litellm_adapter._run_model_turn(
+                    provider, prior_messages, None, False
+                )

Also applies to: 615-633, 703-717

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_tools.py`
around lines 494 - 511, Remove the dead patch of "litellm.acompletion" from the
three tests where it is never reached because ac ompletion_checking_response is
patched to provide responses; specifically, edit
test_run_model_turn_parallel_tools, test_run_model_turn_sequential_tools, and
test_run_model_turn_max_tool_calls_exceeded to delete the with
patch("litellm.acompletion", ...) context managers (the
side_effect=[mock_response, final_response] blocks) and keep the existing
patches for litellm_adapter.acompletion_checking_response,
litellm_adapter.cached_available_tools, and
litellm_adapter.build_completion_kwargs so the tests remain functionally
identical but without the misleading, unreachable patch.

libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (2)

96-103: on_chunk is forwarded to every tool-call iteration turn.

In _run_model_turn, the on_chunk callback fires for every inner loop iteration (tool-call turns), not just the final content turn. Consumers will receive interleaved chunks from reasoning, tool-call deltas, and final content across multiple model calls. This is probably fine for streaming UIs but worth documenting so callers know chunks aren't scoped to a single logical response.

Also applies to: 125-127, 189-191, 224-231
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py` around lines 96
- 103, The on_chunk callback passed into _run_model_turn is currently forwarded
to every inner tool-call iteration, causing streaming chunks from reasoning,
tool deltas, and final content to be interleaved; either explicitly document
this behavior in the _run_model_turn docstring (and the same places noted for
the other iterations) or change the forwarding so on_chunk is only invoked for
the final content turn (e.g., pass None or a no-op for tool-call iterations and
only pass the real on_chunk when emitting the final response); update the
comments/docstrings for the related inner-loop call sites so callers know chunks
are not scoped to a single logical response.
298-316: Always-streaming design is tested and working, but consider a non-streaming fallback and verify logprobs coverage.

The concern about stream_chunk_builder preserving response fields is partially validated by existing tests. test_litellm_adapter_streaming.py extensively verifies that reasoning_content and tool_calls are correctly reassembled during streaming. However, two gaps remain:

Logprobs are untested in streaming: The codebase tracks supports_logprobs as a model capability, but there are no tests verifying logprobs are preserved through stream_chunk_builder reassembly. If logprobs are needed downstream during streaming, add coverage.

Known upstream limitation: Some LiteLLM providers emit reasoning in streamed delta.content but do not provide structured reasoning_content during streaming—only in non-streaming responses. This is a provider-specific issue that may silently affect certain model/provider combinations.

Optional: Consider keeping a non-streaming path when on_chunk is None to avoid the streaming overhead (stream_chunk_builder reassembly) for callers that don't need live callbacks. This maintains backward compatibility for non-streaming callers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py` around lines
298 - 316, The acompletion_checking_response method currently always builds a
StreamingCompletion and reassembles chunks even when no on_chunk callback is
provided; change it to use a non-streaming path when on_chunk is None by calling
the synchronous/non-streaming completion API (or awaiting a single-shot
completion) instead of instantiating StreamingCompletion, and ensure the
returned ModelResponse/Choices still include logprobs by preserving whatever
field/flag the adapter uses (see supports_logprobs and stream_chunk_builder
reassembly logic) so streamed and non-streamed responses have equivalent
logprobs and reasoning_content/tool_calls; add tests exercising logprobs through
streaming reassembly (test_litellm_adapter_streaming.py) and a new non-streaming
case to verify behavior when on_chunk is None.

libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py (1)

49-78: on_chunk is accepted but never forwarded to the tool execution.

The on_chunk parameter is threaded through the call chain but silently ignored in _run (Line 77 – tool.run(...) receives no chunk callback). This is fine for interface conformance with BaseAdapter._run, but worth a brief inline comment so future readers know streaming isn't supported for MCP tools yet.
Suggested comment
+        # Note: on_chunk is accepted for interface conformance but MCP tools
+        # do not support streaming, so it is intentionally unused here.
         result = await tool.run(context=None, **tool_kwargs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py` around lines 49 -
78, The _run method currently accepts an on_chunk callback but never forwards or
documents it; update McpAdapter._run to either forward on_chunk into the tool
execution if the tool supports streaming or, if streaming isn't supported for
MCP tools yet, add a brief inline comment just above the tool.run(...) call
explaining that on_chunk is intentionally ignored (mentioning the parameters
_run and on_chunk and the call site tool.run) so future readers understand this
limitation and don't assume a bug.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py`:
- Around line 48-60: The async iterator __aiter__ in litellm_streaming.py can be
aborted leaving _iterated False and _response unset and never closing the
underlying litellm stream; wrap the streaming logic in a try/finally: create the
stream as before, iterate and yield chunks inside try, and in finally always
call stream.aclose() if stream exists, set self._response =
litellm.stream_chunk_builder(chunks) (even if empty) and self._iterated = True
so stream.response works after interruption; ensure any exceptions are re-raised
after finalization so behavior is preserved.

---

Duplicate comments:
In `@libs/core/kiln_ai/adapters/ml_model_list.py`:
- Around line 1304-1326: The Claude “Thinking” model providers set
thinking_level="medium" but lack explicit reasoning flags; search the model
adapter logic that consumes
thinking_level/reasoning_capable/anthropic_extended_thinking (look for usages of
thinking_level, reasoning_capable, anthropic_extended_thinking in the
model_adapters) and then update the KilnModel provider entries (the KilnModel
instance for ModelName.claude_4_5_haiku_thinking and the other listed Claude
"Thinking" KilnModel blocks) to include reasoning_capable=True on providers that
support single-call reasoning and anthropic_extended_thinking=True only on the
ModelProviderName.anthropic provider entries; apply the same flag changes to the
other referenced blocks (around lines 1394-1439, 1459-1479, 1606-1651,
1691-1731) so the adapter will choose the single-call reasoning path where
appropriate.

---

Nitpick comments:
In `@libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py`:
- Around line 30-36: The wrapper silently allows re-iteration causing duplicate
litellm.acompletion calls; update the StreamingCompletion iterator logic (the
__aiter__ method) to enforce single-use by checking self._iterated and raising a
RuntimeError("StreamingCompletion has already been iterated; create a new
instance.") if True, and only call litellm.acompletion and set
self._response/_iterated on the first iteration; alternatively add this
single-use behavior note to the class docstring for clarity.

In `@libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py`:
- Around line 96-103: The on_chunk callback passed into _run_model_turn is
currently forwarded to every inner tool-call iteration, causing streaming chunks
from reasoning, tool deltas, and final content to be interleaved; either
explicitly document this behavior in the _run_model_turn docstring (and the same
places noted for the other iterations) or change the forwarding so on_chunk is
only invoked for the final content turn (e.g., pass None or a no-op for
tool-call iterations and only pass the real on_chunk when emitting the final
response); update the comments/docstrings for the related inner-loop call sites
so callers know chunks are not scoped to a single logical response.
- Around line 298-316: The acompletion_checking_response method currently always
builds a StreamingCompletion and reassembles chunks even when no on_chunk
callback is provided; change it to use a non-streaming path when on_chunk is
None by calling the synchronous/non-streaming completion API (or awaiting a
single-shot completion) instead of instantiating StreamingCompletion, and ensure
the returned ModelResponse/Choices still include logprobs by preserving whatever
field/flag the adapter uses (see supports_logprobs and stream_chunk_builder
reassembly logic) so streamed and non-streamed responses have equivalent
logprobs and reasoning_content/tool_calls; add tests exercising logprobs through
streaming reassembly (test_litellm_adapter_streaming.py) and a new non-streaming
case to verify behavior when on_chunk is None.

In `@libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py`:
- Around line 49-78: The _run method currently accepts an on_chunk callback but
never forwards or documents it; update McpAdapter._run to either forward
on_chunk into the tool execution if the tool supports streaming or, if streaming
isn't supported for MCP tools yet, add a brief inline comment just above the
tool.run(...) call explaining that on_chunk is intentionally ignored (mentioning
the parameters _run and on_chunk and the call site tool.run) so future readers
understand this limitation and don't assume a bug.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py`:
- Around line 76-91: The render_chunk function assumes chunk.choices[0] exists
and can raise IndexError for empty choices; add a defensive guard at the top of
render_chunk (in the render_chunk method) that checks if not chunk.choices (or
len(chunk.choices) == 0) and handle that case (e.g., call
self.render_unknown(chunk) or return) before referencing chunk.choices[0]; then
proceed with the existing logic for tool_calls, reasoning_content, content, and
finish_reason.
- Around line 127-153: The test fixture uses StructuredOutputMode.unknown which
can later raise ValueError in response_format_options() if the task gains an
output schema; update the adapter_factory fixture to use
StructuredOutputMode.default instead of .unknown when constructing the
LiteLlmAdapter’s LiteLlmConfig/KilnAgentRunConfigProperties so
has_structured_output() won’t short-circuit and tests remain stable (look for
adapter_factory, LiteLlmAdapter, LiteLlmConfig, KilnAgentRunConfigProperties,
response_format_options).
- Around line 97-107: The ChunkRawRenderer class defines an unused field
current_block_type (set in __init__) that is never read or mutated—remove this
dead field to clean up the class; update the __init__ of ChunkRawRenderer to
only initialize self.chunks and leave render_chunk and get_stream_text unchanged
(methods: ChunkRawRenderer.__init__, ChunkRawRenderer.render_chunk,
ChunkRawRenderer.get_stream_text).
- Around line 156-172: Extract the repeated 11-entry list used in the
pytest.mark.parametrize(...) decorators into a single module-level constant
(e.g., SUPPORTED_MODEL_PROVIDER_PAIRS) and replace each duplicated inline list
with that constant in the four tests that parametrize "model_id,provider_name";
keep the parameter names ("model_id", "provider_name") and the
pytest.mark.parametrize call but pass SUPPORTED_MODEL_PROVIDER_PAIRS as the
second argument to avoid copy-paste drift.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_tools.py`:
- Around line 494-511: Remove the dead patch of "litellm.acompletion" from the
three tests where it is never reached because ac ompletion_checking_response is
patched to provide responses; specifically, edit
test_run_model_turn_parallel_tools, test_run_model_turn_sequential_tools, and
test_run_model_turn_max_tool_calls_exceeded to delete the with
patch("litellm.acompletion", ...) context managers (the
side_effect=[mock_response, final_response] blocks) and keep the existing
patches for litellm_adapter.acompletion_checking_response,
litellm_adapter.cached_available_tools, and
litellm_adapter.build_completion_kwargs so the tests remain functionally
identical but without the misleading, unreachable patch.

coderabbitai · 2026-02-20T14:37:43Z

libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py

+    async def __aiter__(self) -> AsyncIterator[ModelResponseStream]:
+        self._response = None
+        self._iterated = False
+
+        chunks: list[ModelResponseStream] = []
+        stream = await litellm.acompletion(*self._args, stream=True, **self._kwargs)
+
+        async for chunk in stream:
+            chunks.append(chunk)
+            yield chunk
+
+        self._response = litellm.stream_chunk_builder(chunks)
+        self._iterated = True


⚠️ Potential issue | 🟠 Major

Missing try/finally leaves response inaccessible after any interrupted iteration.

If the consumer's async for body raises (e.g., an on_chunk callback throws) or if the streaming call itself fails, Python sends GeneratorExit into the generator at the yield point. The two lines after the loop never execute, so _iterated stays False and stream.response will always raise RuntimeError — callers cannot distinguish "not yet started" from "stream failed".

Additionally, the litellm stream object (a CustomStreamWrapper) won't have .aclose() called implicitly when the generator is abandoned without exhaustion.

🐛 Proposed fix — `try/finally` for guaranteed state finalization

async def __aiter__(self) -> AsyncIterator[ModelResponseStream]: self._response = None self._iterated = False chunks: list[ModelResponseStream] = [] - stream = await litellm.acompletion(*self._args, stream=True, **self._kwargs) - - async for chunk in stream: - chunks.append(chunk) - yield chunk - - self._response = litellm.stream_chunk_builder(chunks) - self._iterated = True + try: + stream = await litellm.acompletion(*self._args, stream=True, **self._kwargs) + async for chunk in stream: + chunks.append(chunk) + yield chunk + finally: + self._response = litellm.stream_chunk_builder(chunks) if chunks else None + self._iterated = True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py` around lines 48 - 60, The async iterator __aiter__ in litellm_streaming.py can be aborted leaving _iterated False and _response unset and never closing the underlying litellm stream; wrap the streaming logic in a try/finally: create the stream as before, iterate and yield chunks inside try, and in finally always call stream.aclose() if stream exists, set self._response = litellm.stream_chunk_builder(chunks) (even if empty) and self._iterated = True so stream.response works after interruption; ensure any exceptions are re-raised after finalization so behavior is preserved.

…-memory' of github.com:Kiln-AI/Kiln into leonard/kil-420-adapter-add-streaming

…dapter-add-streaming

… of github.com:Kiln-AI/Kiln into leonard/kil-420-adapter-add-streaming

leonardmq · 2026-03-08T10:50:24Z

Replaced with PR that handles multiturn + streaming + protocols: #1107

leonardmq marked this pull request as draft February 20, 2026 10:21

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_streaming.py Show resolved Hide resolved

leonardmq changed the title ~~fix: claude models missing reasoning~~ WIP: streaming + claude reasoning fix Feb 20, 2026

leonardmq force-pushed the leonard/kil-420-adapter-add-streaming branch from aafe5f3 to 446d4d1 Compare February 20, 2026 10:28

scosman and others added 2 commits February 20, 2026 18:32

Proof of concept streaming API. WIP

0b6fb3a

add integration test for streaming and fix Claude model reasoning

9642085

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

libs/core/kiln_ai/adapters/ml_model_list.py Show resolved Hide resolved

leonardmq force-pushed the leonard/kil-420-adapter-add-streaming branch from 446d4d1 to 9642085 Compare February 20, 2026 10:32

fix: merge conflicts

c57d241

leonardmq added 2 commits February 20, 2026 18:56

not enforce thinking because adaptive

06d24fd

fix: test due to run config properties type union

ca436de

refactor: make the thinking claude models into separate ones to avoid…

36891dd

… mutating existing ones

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

leonardmq added 7 commits March 6, 2026 11:44

Merge branch 'leonard/kil-428-make-autosave_runs-default-false-and-in…

2d44c56

…-memory' of github.com:Kiln-AI/Kiln into leonard/kil-420-adapter-add-streaming

Merge branch 'main' of github.com:Kiln-AI/Kiln into leonard/kil-420-a…

06706c4

…dapter-add-streaming

fix: dependencies

944fb4e

Merge branch 'main' of github.com:Kiln-AI/Kiln into leonard/kil-420-a…

a7442e6

…dapter-add-streaming

fix: pin uv tools in CI and checks

c520e27

fix: pinned uv run in another workflow and mcp hooks

1eb0fe6

Merge branch 'leonard/kil-442-fix-pin-uv-tools-in-ci-build-is-broken'…

737b57b

… of github.com:Kiln-AI/Kiln into leonard/kil-420-adapter-add-streaming

leonardmq closed this Mar 8, 2026

Conversation

leonardmq commented Feb 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklists

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Coverage Report

Diff: origin/main...HEAD

Summary

Line-by-line

libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py

libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py

Uh oh!

leonardmq commented Feb 20, 2026

Uh oh!

leonardmq commented Feb 20, 2026

Uh oh!

coderabbitai bot commented Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

leonardmq commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leonardmq commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

github-actions bot commented Feb 20, 2026 •

edited

Loading