-
Notifications
You must be signed in to change notification settings - Fork 14
Use linguafranca Open Responses requests across ARES #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
2fb61fa
c15b4b6
b00018b
dd442ae
475d4f2
c2ba15d
d34ee47
b4216b6
ee8d913
b7c3771
2ade4ed
3f3483b
ffd22fd
134ef0f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -103,21 +103,21 @@ Follow **Google-style imports**: always import modules, not individual classes o | |
| ```python | ||
| # Good ✅ | ||
| import ares | ||
| from ares import llms | ||
| from ares.llms import open_responses | ||
|
|
||
| request = llms.LLMRequest(messages=[...]) | ||
| request = open_responses.make_request([open_responses.user_message("Hello")]) | ||
| env = ares.make("sbv-mswea") | ||
|
|
||
| # Good for internal code ✅ | ||
| from ares.llms import request | ||
| from ares.llms import open_responses | ||
| from ares.llms import response | ||
|
|
||
| req = request.LLMRequest(messages=[...]) | ||
| resp = response.LLMResponse(data=[...], cost=0.0, usage=...) | ||
| req = open_responses.make_request([open_responses.user_message("Hello")]) | ||
| resp = response.InferenceResult(response=response.make_response("Hello!"), cost=0.0) | ||
|
Comment on lines
+115
to
+116
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check the signature of make_response to see if tokens are required
ast-grep --pattern $'def make_response($$$) -> $_:
$$$'Repository: withmartian/ares Length of output: 3038 🏁 Script executed: #!/bin/bash
# First, let's examine the actual CONTRIBUTING.md at lines 115-116
head -120 CONTRIBUTING.md | tail -10Repository: withmartian/ares Length of output: 423 🏁 Script executed: #!/bin/bash
# Search for make_response usage patterns in the codebase to see how it's typically called
rg "make_response\(" --max-count 20 -B 1 -A 1Repository: withmartian/ares Length of output: 3674 Consider explicitly showing token parameters in the documentation example. While resp = response.InferenceResult(
response=response.make_response("Hello!", input_tokens=10, output_tokens=5),
cost=0.0
)This makes the full API signature clearer for readers learning the framework. 🤖 Prompt for AI Agents |
||
|
|
||
| # Avoid ❌ | ||
| from ares.llms import LLMRequest, TextData | ||
| from ares.llms.request import LLMRequest | ||
| from ares.llms import OpenResponsesRequest, TextData | ||
| from ares.llms.open_responses import Request | ||
| ``` | ||
|
|
||
| **Rationale:** Makes code more readable and explicit about where objects come from. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,7 +15,7 @@ ARES is an RL-first framework for training and evaluating LLM agents, especially | |
|
|
||
| It is a modern [gym](https://github.com/Farama-Foundation/Gymnasium): the environment layer powering RL research. | ||
|
|
||
| ARES treats LLMRequests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself. | ||
| ARES treats Open Responses requests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should give an actual class name - same comment as way above |
||
|
|
||
|
|
||
| ## Quick Start | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,7 +11,7 @@ It's important to understand two different concepts in ARES: | |
| The orchestration logic that uses a Container and LLM to solve tasks (e.g., MiniSWECodeAgent). This is **part of the environment** and remains fixed during training. Think of it as the scaffold that defines how an LLM interacts with code. | ||
|
|
||
| * Agent/Policy (Trained) | ||
| The component you're actually training - a function that maps ``LLMRequest → LLMResponse``. This could be a fine-tuned LLM, a prompt optimizer, or any policy that produces better responses. This is what improves through reinforcement learning. | ||
| The component you're actually training - a function that maps ``OpenResponsesRequest → InferenceResult``. This could be a fine-tuned LLM, a prompt optimizer, or any policy that produces better responses. This is what improves through reinforcement learning. | ||
|
|
||
| System Architecture | ||
| ------------------- | ||
|
|
@@ -30,13 +30,13 @@ Here's how the components fit together: | |
| | generates response | │ │ | ||
| └──────────┬─────────────┘ │ ┌────────────────────────────────┐ │ | ||
| ^ │ │ │ QueueMediatedLLMClient │ │ | ||
| | │ LLMResponse (action) │ │ │ │ | ||
| | │ InferenceResult (action) │ │ │ │ | ||
| | └──────────────────────────┼─>│ Intercepts LLM calls │ │ | ||
| | │ │ from code agent via │ │ | ||
| └─────────────────────────────────┼──│ QueueMediatedLLMClient │ │ | ||
| LLMRequest (observation) │ └──────────────────┬─────────────┘ │ | ||
| Open Responses observation │ └──────────────────┬─────────────┘ │ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: " |
||
| │ ^ │ │ | ||
| │ LLMRequest │ │ LLMResponse │ | ||
| │ Open Responses │ │ InferenceResult│ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same nit as above |
||
| │ │ v │ | ||
| │ ┌──────────────└─────────────────┐ │ | ||
| │ │ CodeAgent │ │ | ||
|
|
@@ -87,7 +87,7 @@ The key abstraction is ``CodeEnvironment``, which: | |
| * **Exposes LLM requests as observations** - Intercepts calls from the code agent | ||
| * **Treats LLM responses as actions** - Your trainable agent/policy provides responses | ||
|
|
||
| Crucially, the **CodeAgent is part of the environment**, not what you're training. Your training loop optimizes an agent/policy that produces better ``LLMResponse`` outputs given ``LLMRequest`` observations. | ||
| Crucially, the **CodeAgent is part of the environment**, not what you're training. Your training loop optimizes an agent/policy that produces better ``InferenceResult`` outputs given canonical Open Responses observations. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "canonical" is confusing here, I would remove |
||
|
|
||
| Standard RL Loop | ||
| ~~~~~~~~~~~~~~~~ | ||
|
|
@@ -101,10 +101,10 @@ Every environment follows the standard RL pattern: | |
| timestep = await env.reset() | ||
|
|
||
| while not timestep.last(): | ||
| # timestep.observation is an LLMRequest from the code agent | ||
| # timestep.observation is an Open Responses request from the code agent | ||
| action = await your_policy(timestep.observation) | ||
|
|
||
| # action is an LLMResponse that continues the agent's execution | ||
| # action is an InferenceResult that continues the agent's execution | ||
| timestep = await env.step(action) | ||
|
|
||
| # timestep.reward contains the reward for the final step | ||
|
|
@@ -116,7 +116,7 @@ TimeStep Structure | |
| Each call to ``reset()`` or ``step()`` returns a ``TimeStep`` with: | ||
|
|
||
| * ``step_type``: One of ``"FIRST"``, ``"MID"``, or ``"LAST"`` | ||
| * ``observation``: An ``LLMRequest`` object (or ``None`` on termination) | ||
| * ``observation``: An Open Responses request object (or ``None`` on termination) | ||
| * ``reward``: A float reward for each step | ||
| * ``discount``: A float discount factor for RL algorithms | ||
|
|
||
|
|
@@ -160,7 +160,7 @@ Example structure: | |
| async def run(self, task: str) -> None: | ||
| while not self.is_done(): | ||
| # Ask LLM what to do next | ||
| request = LLMRequest(messages=[...]) | ||
| request = open_responses.make_request([open_responses.user_message(...)]) | ||
| response = await self._llm_client(request) | ||
|
|
||
| # Parse and execute commands from LLM response | ||
|
|
@@ -234,8 +234,8 @@ Which you will need to rewrite into something like: | |
| # Decide what to ask LLM next | ||
| ... | ||
| llm_response = await self.llm_client( | ||
| LLMRequest( | ||
| messages=[...], | ||
| open_responses.make_request( | ||
| [open_responses.user_message(...)], | ||
| ... # Other request params | ||
| ) | ||
| ) | ||
|
|
@@ -293,30 +293,27 @@ Core Interface | |
|
|
||
| .. code-block:: python | ||
|
|
||
| from linguafranca import types as lft | ||
|
|
||
| class LLMClient(Protocol): | ||
| async def __call__(self, request: LLMRequest) -> LLMResponse: | ||
| async def __call__(self, request: lft.OpenResponsesRequest) -> InferenceResult: | ||
| ... | ||
|
|
||
| @dataclass(frozen=True) | ||
| class LLMRequest: | ||
| messages: Iterable[ChatCompletionMessageParam] | ||
| temperature: float | None = None | ||
|
|
||
| @dataclass(frozen=True) | ||
| class LLMResponse: | ||
| chat_completion_response: ChatCompletion | ||
| class InferenceResult: | ||
| response: lft.OpenResponsesResponse | ||
| cost: float | ||
|
|
||
| This simple interface wraps OpenAI-style chat completion APIs. The ``messages`` field follows the OpenAI format with ``role`` (system/user/assistant) and ``content``. | ||
| ARES uses linguafranca's ``OpenResponsesRequest`` as the canonical request type for observations and client inputs. Edge adapters convert to Chat/Responses/Anthropic formats only when needed. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would change this to a note like
|
||
|
|
||
| Why LLMClient? | ||
| ~~~~~~~~~~~~~~ | ||
|
|
||
| The ``LLMClient`` abstraction serves two purposes: | ||
|
|
||
| 1. **Observations = LLM Requests**: In the RL loop, ``timestep.observation`` is an ``LLMRequest`` containing the messages the code agent wants to send to the LLM. This is the "state" your policy observes. | ||
| 1. **Observations = Open Responses requests**: In the RL loop, ``timestep.observation`` is a canonical Open Responses request containing what the code agent wants to send to the LLM. This is the "state" your policy observes. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same "canonical" comment |
||
|
|
||
| 2. **Actions = LLM Responses**: In the RL loop, the ``action`` you pass to ``env.step()`` is an ``LLMResponse`` containing the LLM's reply. This is how your policy controls the agent's behavior. | ||
| 2. **Actions = LLM Responses**: In the RL loop, the ``action`` you pass to ``env.step()`` is an ``InferenceResult`` containing the LLM's reply. This is how your policy controls the agent's behavior. | ||
|
|
||
| This framing makes it natural to think about code agent training as an RL problem: you're learning a policy that maps agent requests to helpful responses. | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,7 +28,7 @@ The ``QueueMediatedLLMClient`` implements the ``LLMClient`` protocol, but instea | |
|
|
||
| Meanwhile, the environment: | ||
|
|
||
| 1. **Watches the queue**: Extracts ``LLMRequest`` objects as they arrive | ||
| 1. **Watches the queue**: Extracts canonical Open Responses requests as they arrive | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "canonical" |
||
| 2. **Exposes them as observations**: Returns them from ``reset()`` and ``step()`` | ||
| 3. **Provides responses**: When you call ``step(action)``, sets the Future's result | ||
|
|
||
|
|
@@ -39,12 +39,14 @@ The core implementation is simple: | |
|
|
||
| .. code-block:: python | ||
|
|
||
| from linguafranca import types as lft | ||
|
|
||
| @dataclass(frozen=True) | ||
| class QueueMediatedLLMClient(LLMClient): | ||
| q: asyncio.Queue[ValueAndFuture[LLMRequest, LLMResponse]] | ||
| q: asyncio.Queue[ValueAndFuture[lft.OpenResponsesRequest, InferenceResult]] | ||
|
|
||
| async def __call__(self, request: LLMRequest) -> LLMResponse: | ||
| future = asyncio.Future[LLMResponse]() | ||
| async def __call__(self, request: lft.OpenResponsesRequest) -> InferenceResult: | ||
| future = asyncio.Future[InferenceResult]() | ||
| await self.q.put(ValueAndFuture(value=request, future=future)) | ||
| return await future # Blocks until env provides response | ||
|
|
||
|
|
@@ -65,7 +67,7 @@ The environment side: | |
| self._llm_req_future = value_and_future.future | ||
| return TimeStep(step_type="MID", observation=value_and_future.value, ...) | ||
|
|
||
| async def step(self, action: LLMResponse) -> TimeStep: | ||
| async def step(self, action: InferenceResult) -> TimeStep: | ||
| # Unblock the code agent by providing response | ||
| self._llm_req_future.set_result(action) | ||
| return await self._get_time_step() | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,10 +20,10 @@ See the main `README <https://github.com/withmartian/ares>`_ for installation in | |
| Key Features | ||
| ------------ | ||
|
|
||
| * **RL-First Design**: Built around the reinforcement learning loop with observations (LLM requests) and actions (LLM responses) | ||
| * **RL-First Design**: Built around the reinforcement learning loop with observations (Open Responses requests) and actions (LLM responses) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should leave this as "LLM requests" here for explanatory purposes |
||
| * **LLM-Level Optimization**: Train the LLM within code agents, not just the agent as a whole | ||
| * **Distributed Workloads**: Support for high-volume, distributed training and evaluation | ||
| * **Mechanistic Interpretability**: Raw access to LLM requests and responses for deep analysis | ||
| * **Mechanistic Interpretability**: Raw access to canonical LLM requests and responses for deep analysis | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "canonical" |
||
| * **Async Gym/dm_env like Spec**: Close to Gym/dm_env spec, but incorporating async methods for performance | ||
|
|
||
| Indices and tables | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -48,7 +48,9 @@ | |
|
|
||
| import ares | ||
| from ares import llms | ||
| from ares.llms import open_responses | ||
| import hydra | ||
| from linguafranca import types as lft | ||
| import omegaconf | ||
|
Comment on lines
49
to
54
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ARES imports appear before third-party imports; should we reorder to stdlib → third-party → local/ARES with blank lines and ARES last per CLAUDE.md? Finding type: Want Baz to fix this for you? Activate Fixer Other fix methodsPrompt for AI Agents: |
||
| import ray | ||
| import skyrl_gym | ||
|
|
@@ -91,7 +93,7 @@ def __init__(self, env_config: dict | None = None, extras: dict | None = None, * | |
| self.preset_name = extras.get("preset_name", kwargs.get("preset_name")) | ||
| if not self.preset_name: | ||
| raise ValueError("preset_name must be provided in extras or kwargs") | ||
| self.env: ares.Environment[llms.LLMResponse, llms.LLMRequest, float, float] | None = None | ||
| self.env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest, float, float] | None = None | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Q for @rsmith49 Is this confusing? We could alias linguafranca types so people can use ARES aliases instead, but I'm not sure if that's even more confusing.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I agree it feels a little off. The right approach is probably to wrap If we want to do this approach long-term, I think aliasing the type within ARES makes sense for now
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, I ended up using an alias to the type called |
||
|
|
||
| async def init( | ||
| self, prompt: base_text_env.ConversationType | ||
|
|
@@ -104,7 +106,8 @@ async def init( | |
| await self.env.__aenter__() | ||
| ts = await self.env.reset() | ||
|
|
||
| return ts.observation.messages, {} # type: ignore | ||
| assert ts.observation is not None | ||
| return open_responses.to_chat_messages(ts.observation, strict=True), {} | ||
|
|
||
| async def step(self, action: str) -> base_text_env.BaseTextEnvStepOutput: | ||
| """Runs one environment step. | ||
|
|
@@ -119,18 +122,17 @@ async def step(self, action: str) -> base_text_env.BaseTextEnvStepOutput: | |
| """ | ||
| assert self.env is not None | ||
|
|
||
| llm_resp = llms.LLMResponse( | ||
| data=[llms.TextData(content=action)], | ||
| llm_resp = llms.InferenceResult( | ||
| response=llms.make_response(action), | ||
| cost=0.0, | ||
| usage=llms.Usage(prompt_tokens=-1, generated_tokens=-1), | ||
| ) | ||
| ts = await self.env.step(llm_resp) | ||
|
|
||
| if ts.last(): | ||
| # Hack to approximate a context manager | ||
| await self.env.__aexit__(None, None, None) | ||
|
|
||
| msgs = [] if ts.last() else ts.observation.messages | ||
| msgs = [] if ts.last() else open_responses.to_chat_messages(ts.observation, strict=True) | ||
| return base_text_env.BaseTextEnvStepOutput( | ||
| observations=msgs, | ||
| reward=ts.reward or 0.0, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,8 +49,10 @@ | |
| import ares | ||
| from ares import containers | ||
| from ares import llms | ||
| from ares.llms import open_responses | ||
| import chz | ||
| import frozendict | ||
| from linguafranca import types as lft | ||
| import numpy as np | ||
|
Comment on lines
49
to
56
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we move Finding type: Want Baz to fix this for you? Activate Fixer Other fix methodsPrompt for AI Agents: |
||
| import tinker | ||
| from tinker_cookbook import cli_utils | ||
|
|
@@ -109,8 +111,8 @@ class TinkerCompatibleEnv(tinker_types.Env): | |
| """Adapter wrapping ARES environments to work with Tinker's RL training loop. | ||
|
|
||
| Handles bidirectional conversion: | ||
| - ARES LLMRequest -> Tinker ModelInput (tokenized prompts) | ||
| - Tinker Action (text) -> ARES LLMResponse | ||
| - ARES Open Responses request -> Tinker ModelInput (tokenized prompts) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Include actual class name here |
||
| - Tinker Action (text) -> ARES InferenceResult | ||
| - ARES TimeStep -> Tinker StepResult | ||
|
|
||
| This enables using any ARES environment with Tinker's training infrastructure. | ||
|
|
@@ -121,7 +123,7 @@ class TinkerCompatibleEnv(tinker_types.Env): | |
|
|
||
| def __init__( | ||
| self, | ||
| env: ares.Environment[llms.LLMResponse, llms.LLMRequest, float, float], | ||
| env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest, float, float], | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly looking at the change, I do kind of like |
||
| renderer: renderers.Renderer, | ||
| convo_prefix: list[renderers.Message] | None, | ||
| max_tokens: int, | ||
|
|
@@ -132,14 +134,14 @@ def __init__( | |
| self.max_tokens = max_tokens | ||
|
|
||
| def _get_tinker_observation( | ||
| self, ts: ares.TimeStep[llms.LLMRequest | None, float, float] | ||
| self, ts: ares.TimeStep[lft.OpenResponsesRequest | None, float, float] | ||
| ) -> tinker_types.Observation: | ||
| if ts.observation is None: | ||
| return tinker.ModelInput.empty() | ||
|
|
||
| messages = self.convo_prefix + [ | ||
| renderers.Message(role=message["role"], content=message["content"]) # type: ignore | ||
| for message in ts.observation.messages | ||
| for message in open_responses.to_chat_messages(ts.observation, strict=True) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we remove |
||
| ] | ||
| model_input = self.renderer.build_generation_prompt(messages) | ||
|
|
||
|
|
@@ -149,15 +151,14 @@ def _get_tinker_observation( | |
|
|
||
| return model_input | ||
|
|
||
| def _get_ares_action(self, action: tinker_types.Action) -> llms.LLMResponse: | ||
| def _get_ares_action(self, action: tinker_types.Action) -> llms.InferenceResult: | ||
| message, parse_success = self.renderer.parse_response(action) | ||
| if not parse_success: | ||
| _LOGGER.warning("Failed to parse response: %s", message) | ||
|
|
||
| return llms.LLMResponse( | ||
| data=[llms.TextData(content=_get_text_content(message))], | ||
| return llms.InferenceResult( | ||
| response=llms.make_response(_get_text_content(message)), | ||
| cost=0.0, | ||
| usage=llms.Usage(prompt_tokens=-1, generated_tokens=-1), | ||
| ) | ||
|
|
||
| @property | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.