Skip to content

Refactor LLM chats: separate streaming logic and enforce strict typing#12

Open
s-alexey wants to merge 1 commit intocifrom
genai
Open

Refactor LLM chats: separate streaming logic and enforce strict typing#12
s-alexey wants to merge 1 commit intocifrom
genai

Conversation

@s-alexey
Copy link
Contributor

@s-alexey s-alexey commented Jan 7, 2026

Major refactoring of the LLM interaction layer, significantly enhancing the llm.prompt method to establish it as the primary, unified entry point for all model communications. The goal is to abstract away model-specific logic, providing seamless support for structured outputs, automatic tool calling, and vision capabilities across all integrated models.

This simplifies task definitions and enhances the user experience by providing a consistent, high-level API.

  • Enhanced llm.prompt with automatic tool calling:

    • The llm.prompt method has been upgraded to manage the entire conversation turn, now including a built-in, multi-step tool-calling loop.
    • When tools are provided, prompt automatically orchestrates the interaction: it invokes the LLM, executes requested tools, sends the results back, and repeats this cycle until a final answer is generated. This eliminates the need for manual tool-handling logic in task definitions.
  • Automatic tool-calling emulation:

    • For models that lack native tool-calling support, a new emulation layer transparently provides this functionality by wrapping the requests with structured prompts, making the feature available across all models.
  • Refactored Actor model:

    • The llms.py module has been streamlined. API-specific logic has been moved into dedicated actors/genai.py and actors/openai.py modules.
    • Experimental streaming functionality is now encapsulated in separate classes (e.g., StreamingGoogleGenAI, StreamingOpenAIResponsesAPI) to isolate it from the core API used for scheduled runs
  • Improved vision and image support:

  • Enhanced support for multimodal inputs, particularly for the Gemini API. The framework now correctly handles image content, including captions and various data formats (URLs and base64).

  • New agentic assertion:

  • Added assert_tool_was_invoked to allow for testing and evaluation of agentic behavior by verifying that a specific tool was used during a task.

  • Updated Examples & Tests

    • Revised examples to demonstrate the simplified llm.prompt API for tool use.
    • Added a comprehensive suite of API integration tests (test_api_integration.py) that run against live OpenAI, Google, and Model Proxy endpoints (when API keys are available) to ensure cross-model consistency.

@s-alexey s-alexey requested a review from dolaameng January 7, 2026 16:04
@s-alexey s-alexey added the wip Work in progress label Jan 7, 2026
Copy link
Contributor

@dolaameng dolaameng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring, which is really helpful and clearer! Left some questions/comments to understand more.

We can keep iterating it.

"""
Defines a chat agent that interacts with a Large Language Model (LLM).

Design Note:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we update here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, I will get back to that once the internal api is more stable



@dataclasses.dataclass
class FunctionCall:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also consider thought signature for reasoning models? Context here. Or it's already in "LLMMessage.thinking"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to have it as well, perhaps in a separate PR as this one is already overcomplicated

temperature: float | None = 0,
seed: int = 0,
tools: list[Any] | None = None,
) -> LLMMessage[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being explicit is nice! Wonder if we also want to keep the flexibility for other parameters, e.g., the config

return result


class ModelProxyOpenAI(OpenAI):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still support streaming output for Model Proxy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should, but I would rather wait till they fully support responses API so we don't have to implement it twice

@s-alexey s-alexey force-pushed the genai branch 3 times, most recently from 74c099d to 9b62a2d Compare January 14, 2026 16:00

answer = response.content
response._meta.update(chat=chat, schema=schema, raw_content=answer, **kwargs)
response._meta.update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still use _meta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it for now, just to be safe

Copy link

@develra develra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - it's a bit hard for me to tell as a person pretty ignorant of this code if the test coverage is sufficient to be confident that these changes are safe. I think that it would be good to think through what might break as a result of these changes and make sure we have test coverage for it - especially given the somewhat sensitive timing of a new launch.

{
"role": message.sender.role
if message.sender.role != "tool"
else "system", # TODO: Remove this renaming once ModelProxy supports tools
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this TODO - do we know if that is on the roadmap for ModelProxy?

temperature: float | None = 0,
seed: int = 0,
tools: list[Any] | None = None,
) -> LLMMessage[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For invoke, shall we return LLMMessage[T] for image output etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know yet. Should we will have another method for image generation?

tool_calls: list[FunctionCall] | None = None
usage: Usage | None = None

def add_chunk(self, chunk: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what's the different uses of this new method and Message.stream?


@dataclasses.dataclass
class LLMMessage(messages.Message[T]):
content: T
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the add_chunk method, do we essentially assume content is always string? If so shall we just rename it to text etc?

We can add another filed as image for image output in the future?

content: T
_status: utils.Status = utils.Status.RUNNING
thinking: str | None = None
tool_calls: list[FunctionCall] | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these fields are also useful to Message, do you think so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @s-alexey that it will be great for us to test more existing examples to avoid back-incompatibility.

Major refactor of the LLM chat architecture to improve code organization,
maintainability, and type safety.

Key Changes:
- Split `LLMChat` subclasses into distinct Non-Streaming and Streaming
  implementations. Streaming logic (primarily for notebooks) was
  complicating the core classes; this split makes primary actors more
  concise and less error-prone.
- Moved provider-specific implementations into separate files:
  `openai.py` and `genai.py`.
- Replaced the generic `LLMResponse` with a strictly typed version,
  specifically enforcing types for `tool_usage` and `token_usage`.
- Updated `invoke` method to accept explicit arguments.
- Migrated OpenAI integration from the `completion` API to the more
  user-friendly `responses` API.

Testing:
- Added coverage for common use cases using real APIs (tests run
  conditionally if environment keys are present).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wip Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants