Conversation
| """ | ||
| Defines a chat agent that interacts with a Large Language Model (LLM). | ||
|
|
||
| Design Note: |
There was a problem hiding this comment.
nit: shall we update here too?
There was a problem hiding this comment.
good idea, I will get back to that once the internal api is more stable
src/kaggle_benchmarks/actors/llms.py
Outdated
|
|
||
|
|
||
| @dataclasses.dataclass | ||
| class FunctionCall: |
There was a problem hiding this comment.
Shall we also consider thought signature for reasoning models? Context here. Or it's already in "LLMMessage.thinking"?
There was a problem hiding this comment.
I would love to have it as well, perhaps in a separate PR as this one is already overcomplicated
| temperature: float | None = 0, | ||
| seed: int = 0, | ||
| tools: list[Any] | None = None, | ||
| ) -> LLMMessage[str]: |
There was a problem hiding this comment.
Being explicit is nice! Wonder if we also want to keep the flexibility for other parameters, e.g., the config
| return result | ||
|
|
||
|
|
||
| class ModelProxyOpenAI(OpenAI): |
There was a problem hiding this comment.
Do we still support streaming output for Model Proxy?
There was a problem hiding this comment.
We should, but I would rather wait till they fully support responses API so we don't have to implement it twice
74c099d to
9b62a2d
Compare
|
|
||
| answer = response.content | ||
| response._meta.update(chat=chat, schema=schema, raw_content=answer, **kwargs) | ||
| response._meta.update( |
There was a problem hiding this comment.
I would keep it for now, just to be safe
develra
left a comment
There was a problem hiding this comment.
LGTM - it's a bit hard for me to tell as a person pretty ignorant of this code if the test coverage is sufficient to be confident that these changes are safe. I think that it would be good to think through what might break as a result of these changes and make sure we have test coverage for it - especially given the somewhat sensitive timing of a new launch.
| { | ||
| "role": message.sender.role | ||
| if message.sender.role != "tool" | ||
| else "system", # TODO: Remove this renaming once ModelProxy supports tools |
There was a problem hiding this comment.
Looking at this TODO - do we know if that is on the roadmap for ModelProxy?
| temperature: float | None = 0, | ||
| seed: int = 0, | ||
| tools: list[Any] | None = None, | ||
| ) -> LLMMessage[str]: |
There was a problem hiding this comment.
For invoke, shall we return LLMMessage[T] for image output etc?
There was a problem hiding this comment.
I don't know yet. Should we will have another method for image generation?
| tool_calls: list[FunctionCall] | None = None | ||
| usage: Usage | None = None | ||
|
|
||
| def add_chunk(self, chunk: str): |
There was a problem hiding this comment.
Curious what's the different uses of this new method and Message.stream?
|
|
||
| @dataclasses.dataclass | ||
| class LLMMessage(messages.Message[T]): | ||
| content: T |
There was a problem hiding this comment.
Look at the add_chunk method, do we essentially assume content is always string? If so shall we just rename it to text etc?
We can add another filed as image for image output in the future?
src/kaggle_benchmarks/actors/llms.py
Outdated
| content: T | ||
| _status: utils.Status = utils.Status.RUNNING | ||
| thinking: str | None = None | ||
| tool_calls: list[FunctionCall] | None = None |
There was a problem hiding this comment.
I think these fields are also useful to Message, do you think so?
There was a problem hiding this comment.
Discussed with @s-alexey that it will be great for us to test more existing examples to avoid back-incompatibility.
05c8fd2 to
e611cf2
Compare
Major refactor of the LLM chat architecture to improve code organization, maintainability, and type safety. Key Changes: - Split `LLMChat` subclasses into distinct Non-Streaming and Streaming implementations. Streaming logic (primarily for notebooks) was complicating the core classes; this split makes primary actors more concise and less error-prone. - Moved provider-specific implementations into separate files: `openai.py` and `genai.py`. - Replaced the generic `LLMResponse` with a strictly typed version, specifically enforcing types for `tool_usage` and `token_usage`. - Updated `invoke` method to accept explicit arguments. - Migrated OpenAI integration from the `completion` API to the more user-friendly `responses` API. Testing: - Added coverage for common use cases using real APIs (tests run conditionally if environment keys are present).
Major refactoring of the LLM interaction layer, significantly enhancing the
llm.promptmethod to establish it as the primary, unified entry point for all model communications. The goal is to abstract away model-specific logic, providing seamless support for structured outputs, automatic tool calling, and vision capabilities across all integrated models.This simplifies task definitions and enhances the user experience by providing a consistent, high-level API.
Enhanced
llm.promptwith automatic tool calling:Automatic tool-calling emulation:
Refactored Actor model:
llms.pymodule has been streamlined. API-specific logic has been moved into dedicatedactors/genai.pyandactors/openai.pymodules.StreamingGoogleGenAI,StreamingOpenAIResponsesAPI) to isolate it from the core API used for scheduled runsImproved vision and image support:
Enhanced support for multimodal inputs, particularly for the Gemini API. The framework now correctly handles image content, including captions and various data formats (URLs and base64).
New agentic assertion:
Added
assert_tool_was_invokedto allow for testing and evaluation of agentic behavior by verifying that a specific tool was used during a task.Updated Examples & Tests
llm.promptAPI for tool use.test_api_integration.py) that run against live OpenAI, Google, and Model Proxy endpoints (when API keys are available) to ensure cross-model consistency.