Skip to content

Commit fe5e465

Browse files
recursixCube Harnessclaude
authored
feat(tool): consolidate Tool + AsyncTool into one class — RFC + code (#212)
* chore(openspec): draft tool-consolidation RFC Proposes collapsing `Tool` + `AsyncTool` (and `AbstractTool` + `AbstractAsyncTool`) into one `Tool` class that supports `@tool_action` methods of either kind on the same class. Mirrors the dual-API pattern shipped on cube-harness `MonitoredTool` (PR #386): * `execute_action(action)` — sync dispatch * `async_execute_action(action)` — async dispatch, universal call-site `async_execute_action` already exists on `AbstractTool` as a non-abstract method from cube-standard #152. This proposal makes it the canonical universal call-site and drops the class-level split that hid the dual surface. Migration: `AsyncTool` and `AbstractAsyncTool` become aliases of the unified classes for one release window. The single in-tree `AsyncTool` subclass (`AsyncBrowserTool`) gets a one-line edit; everything else keeps working unchanged. Phase 2 (collapse `Toolbox` + `AsyncToolbox`) is explicitly out of scope here — `AsyncToolbox.execute_action` is async by contract and collapsing it would break every `await tb.execute_action(action)` call site. A future RFC if cost-benefit shifts. Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com> Signed-off-by: Cube Harness <cube-harness@example.com> * docs(openspec): make tool-consolidation proposal pedagogical Expanded the proposal with: * "What this is, in one sentence" lede explaining the gist * Concrete friction story showing why one tool can't have one async method + N sync helpers today * "Before vs after" mermaid diagram * Three concrete code cases authors care about (pure sync, pure async, mixed) with 4-line examples each * Dispatch-matrix mermaid showing how the two call sites route per the action's def keyword * Sync vs async Agent author UX section + table * Debug story with actual call-stack snippets for both paths * Efficiency story with concrete numbers from the parallelism smoke * Migration story as a "what you have / what you'll have" table The structure walks the reader through *why* (the friction), *what* (the diagram + code cases), *how* (the dispatch matrix), and *what it costs/saves* (debug + efficiency stories) before the alternatives/risks sections. Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com> Signed-off-by: Cube Harness <cube-harness@example.com> * docs(openspec): tool-consolidation — symmetric bridge, F1 acknowledgement Pivot from "raise TypeError on sync→async" to "auto-bridge in both directions", on the back of a discussion with the harness author: * RAM overhead is noise (± a few hundred KB per active bridge, pooled threads for async→sync). * `_arun` already handles sync inners cleanly via `asyncio.to_thread` (shipped by PR #386); symmetric move is for `execute_action` to bridge async actions via thread+loop so 90/10 mixed tools "just work" from any call-site. * The 99% fast paths stay at zero overhead; bridge cost (~2-5 ms) is only paid on the rare async-from-sync call. What changed in the docs: * **Dispatch matrix** is now 2x2 with both bridges shown. The "raises TypeError" red box is gone; both call-sites work for both action kinds. * New section **"What the implementation looks like"** with the actual ~50 LOC dispatch + bridge code, including `contextvars` propagation for OTel/tracing across the bridge thread. * New subsection **"Thread-affine tools (override hook)"** documents the override pattern for sync-Playwright-style tools that need per-instance single-threaded executors. This is the layer-correct extension point for F1's class of problems. * Risks updated: bridge overhead becomes the new risk (transparent but not free); class-definition-time validation regression is softer; thread affinity stays an author's concern via the documented override hook. * New section **"Known harness-side follow-ups (out of scope)"** names F1 (sync-browser cubes on async episode) and PR #487 (the install_monitoring task.tool-clobber fix) so reviewers see the surrounding landscape and the layer separation. Companion section: cube-harness needs no API change for this RFC; F1's harness-side fix (per-episode env-executor) is orthogonal. Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com> Signed-off-by: Cube Harness <cube-harness@example.com> * docs(openspec): tool-consolidation — also collapse Toolbox + AsyncToolbox The original draft framed `Toolbox` / `AsyncToolbox` consolidation as "Phase 2 — out of scope" on the misread that we'd be collapsing into a sync-only Toolbox and breaking every `await tb.execute_action(...)` call site. Reviewer (correctly) pushed back: the same dual-API pattern collapses Toolbox cleanly. What changed in the docs: * **The change in one diagram** — extended to three layers (AbstractTool, Tool, Toolbox). All three collapse to one class each, all carrying the same dual API. * **Toolbox section** (replaces the rejected "Phase 2" section) — shows the ~10 LOC routing on the unified Toolbox and the `AsyncToolbox(Toolbox)` deprecation shim that preserves `await tb.execute_action(...)` for legacy callers with a DeprecationWarning. * **Migration table** — adds Toolbox / AsyncToolbox migration rows. * **Alternatives** — adds the "consolidate Tool only" variant and explains why it was rejected (so the prior framing is on the record, not silently rewritten). * **deltas.md** — adds the Toolbox class snippet + AsyncToolbox shim block alongside the existing AsyncTool / AbstractAsyncTool aliases. Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com> Signed-off-by: Cube Harness <cube-harness@example.com> * feat(tool): consolidate Tool/AsyncTool + Toolbox/AsyncToolbox Implementation of the tool-consolidation RFC. Three layers collapse to one class each carrying the dual API (`execute_action` sync + `async_execute_action` async). All four cells of the dispatch matrix work without raising — the rare-path bridges (sync→async via thread+loop, async→sync via asyncio.to_thread) keep mixed-action tools working from any call site. `src/cube/tool.py`: * `Tool.execute_action` — sync caller. Sync action: direct call. Async action: bridge via one-shot daemon thread + new event loop (~2-5 ms). `contextvars.copy_context()` propagates the caller's context (OTel spans, tracing state) into the worker. * `Tool.async_execute_action` — async caller. Async action: direct await. Sync action: `asyncio.to_thread` (pooled, parallel-safe). * `Tool._bridge_async_to_sync` helper hosts the thread+loop bridge. Doc-string covers overhead + override hook for thread-affine tools. * `_ToolActionsMixin.__init_subclass__` no longer enforces all-sync or all-async — mixing is now explicitly supported. * `AsyncTool` becomes a deprecated alias subclassing both `Tool` (for the new dual API) AND `AbstractAsyncTool` (so existing `isinstance(x, AbstractAsyncTool)` checks keep returning True). Overrides `execute_action` with async semantics so legacy `await asynctool.execute_action(a)` callers keep working. Subclassing emits a `DeprecationWarning`. * `AbstractAsyncTool` becomes a deprecated shim subclassing `AbstractTool`. Keeps abstract async `execute_action` so legacy abstract subclasses keep working. Subclassing emits a `DeprecationWarning` (skipped for our own `AsyncTool` shim at module import time to avoid noise). * `Toolbox` gets `async_execute_action` (routes to leaf's async dispatch) and `async_reset`/`async_close` (await coroutine returns from any leaf). Sync `reset`/`close` close (don't await) any legacy coroutine returns so they don't leak. * `AsyncToolbox` becomes a deprecated shim subclassing `Toolbox`, with async `execute_action`/`reset`/`close` preserved. Calling `execute_action` emits a `DeprecationWarning`. `src/cube/tools/browser.py`: * `AsyncBrowserTool(AsyncTool)` → `AsyncBrowserTool(Tool)`. One-line body change + docstring update. Subclasses with `async def` action methods keep working — `Tool.async_execute_action` dispatches them natively. `tests/test_tool.py`: * Replaced `test_async_tool_rejects_sync_action_at_class_definition` (the validation it asserted is intentionally removed) with `test_async_tool_subclass_accepts_mixed_methods` covering the new affordance + deprecation warning. * Added the 4-cell dispatch matrix: (sync caller × {sync, async} action) × (async caller × {sync, async} action). * Added `test_sync_caller_async_action_bridge_inside_running_loop` — the motivating use case (Agent._run inside Episode's asyncio.run). * Added `test_async_caller_parallel_gather_over_sync_actions` — real OS-thread parallelism via to_thread inside asyncio.gather. * Added deprecation-warning tests for `AsyncTool` subclass and `AsyncToolbox.execute_action`. * Added `test_asynctool_subclass_isinstance_of_abstract_async_tool` — backward compat for isinstance checks across the alias. `scripts/smoke/tool_consolidation.py`: * 9-check end-to-end smoke covering the 4-cell matrix, real parallelism (4×100ms sync sleeps via gather → <250ms wall-clock), bridge-from-inside-running-loop, and both deprecation shims. Test status: * `pytest tests/test_tool.py`: 52 passed, 16 deprecation warnings (all intentional — alias subclassing). * `scripts/smoke/tool_consolidation.py`: 9/9 cells pass. * `make lint`: clean. Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com> Signed-off-by: Cube Harness <cube-harness@example.com> * feat(tool): drop AsyncTool/AsyncToolbox/AbstractAsyncTool — no deprecation window The previous commit shipped these as deprecation shims preserving the legacy async-execute_action contract for one release window. We're pre-1.0, so removing them outright is cleaner: - `Tool` already supports sync OR async `@tool_action` methods on one class. - `Toolbox.async_execute_action` is the canonical async call-site. - `AbstractTool.async_execute_action` is the canonical async base. Drops ~280 LOC of shim machinery (classes, deprecation tests, smoke cells, docstring history). Paired cube-harness PR migrates the only remaining consumer (`MonitoredTool`, `as_async`, `Agent.run` isinstance checks). Net diff: +153 / -581 over the previous shimmed commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com> --------- Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com> Signed-off-by: Cube Harness <cube-harness@example.com> Co-authored-by: Cube Harness <cube-harness@example.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent ca29d14 commit fe5e465

7 files changed

Lines changed: 931 additions & 354 deletions

File tree

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ or coordinate experiments — that lives in **cube-harness**.
1515
| Layer | Module | Spec | What it does |
1616
|-------|--------|------|--------------|
1717
| 1. Core types | `cube.core` | [core/spec.md](openspec/specs/core/spec.md) | `Action`, `Observation`, `Content`, `EnvironmentOutput`, `TypedBaseModel` |
18-
| 2. Tool | `cube.tool` | [tool/spec.md](openspec/specs/tool/spec.md) | `Tool`, `AsyncTool`, `@tool_action`, `ToolConfig`, `Toolbox` |
18+
| 2. Tool | `cube.tool` | [tool/spec.md](openspec/specs/tool/spec.md) | `Tool`, `@tool_action`, `ToolConfig`, `Toolbox` |
1919
| 3. Task | `cube.task` | [task/spec.md](openspec/specs/task/spec.md) | `Task`, `TaskMetadata`, `TaskConfig`, gym-style `reset/step/evaluate` |
2020
| 4. Benchmark | `cube.benchmark` | [benchmark/spec.md](openspec/specs/benchmark/spec.md) | `Benchmark`, `BenchmarkMetadata`, class-level registry |
2121
| 5. Testing | `cube.testing` | [testing/spec.md](openspec/specs/testing/spec.md) | `run_debug_suite`, `assert_debug_tasks_reward_one` |
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# Deltas: Collapse `Tool` + `AsyncTool`
2+
3+
Applies to: `openspec/specs/tool/spec.md`.
4+
5+
See proposal: `proposal.md`.
6+
7+
---
8+
9+
## MODIFIED — `openspec/specs/tool/spec.md`
10+
11+
### `AbstractTool` — dual call surface, accepts both sync and async actions
12+
13+
```python
14+
class AbstractTool(ABC):
15+
@property
16+
@abstractmethod
17+
def action_set(self) -> list[ActionSchema]: ...
18+
19+
def execute_action(self, action: Action) -> Any:
20+
"""Sync dispatch. Subclasses override."""
21+
22+
async def async_execute_action(self, action: Action) -> Any:
23+
"""Async dispatch — universal call-site.
24+
25+
Default: call sync `execute_action` directly on the current
26+
thread. Subclasses with async-native dispatch (e.g. `Tool`
27+
with async actions) override to handle both kinds without a
28+
`to_thread` hop."""
29+
return self.execute_action(action)
30+
```
31+
32+
### `Tool` (concrete base) — sync OR async `@tool_action` methods on the same class
33+
34+
```python
35+
class Tool(_ToolActionsMixin, AbstractTool):
36+
"""Concrete base for any tool. `@tool_action` methods may be sync
37+
or async, per method. Action discovery via `action_set` still works
38+
identically; dispatch routes per the method's kind and bridges the
39+
impedance mismatch when caller and method differ.
40+
"""
41+
42+
def execute_action(self, action: Action) -> Observation | StepError:
43+
"""Sync call-site. Works for both sync and async actions.
44+
45+
Sync action → direct call, no overhead.
46+
Async action → bridge via a one-shot worker thread with its own
47+
event loop (~2-5 ms). `contextvars.copy_context()` propagates
48+
the caller's tracing/OTel context into the worker.
49+
"""
50+
method = self.get_action_method(action)
51+
if inspect.iscoroutinefunction(method):
52+
return self._bridge_async_to_sync(method, action)
53+
# sync dispatch (existing body — validate args, call, wrap result)
54+
...
55+
56+
async def async_execute_action(self, action: Action) -> Observation | StepError:
57+
"""Async call-site. Works for both sync and async actions.
58+
59+
Async action → direct await, no thread hop.
60+
Sync action → `asyncio.to_thread(method)` — pooled worker, no
61+
per-call spawn, enables real OS-thread parallelism when wrapped
62+
in `asyncio.gather`.
63+
64+
Tools with thread-affinity needs (sync Playwright, etc.) override
65+
to dispatch through a per-instance single-threaded executor.
66+
"""
67+
method = self.get_action_method(action)
68+
if inspect.iscoroutinefunction(method):
69+
result = await method(**action.arguments)
70+
else:
71+
result = await asyncio.to_thread(method, **action.arguments)
72+
# wrap → Observation | StepError (existing wrap logic)
73+
...
74+
75+
def _bridge_async_to_sync(
76+
self, async_method: Callable, action: Action
77+
) -> Observation | StepError:
78+
"""Run an async method to completion from a sync call-site.
79+
80+
Implementation: spawn a daemon thread, run a new event loop
81+
inside it, block on the result. ~2-5 ms overhead per call.
82+
Caller's `contextvars` are propagated so OTel spans /
83+
logging state carry into the worker.
84+
"""
85+
ctx = contextvars.copy_context()
86+
fut: concurrent.futures.Future = concurrent.futures.Future()
87+
88+
def runner() -> None:
89+
try:
90+
fut.set_result(ctx.run(asyncio.run, async_method(**action.arguments)))
91+
except Exception as e:
92+
fut.set_exception(e)
93+
94+
threading.Thread(target=runner, daemon=True).start()
95+
# wrap fut.result() → Observation | StepError
96+
...
97+
```
98+
99+
### `Toolbox` — single container class with dual routing
100+
101+
```python
102+
class Toolbox(Tool):
103+
"""Container. Holds Tools. Routes actions by name; the leaves do
104+
the per-method sync/async dispatch.
105+
"""
106+
107+
def __init__(self, tools: list[Tool]):
108+
self.tools = tools
109+
self._action_name_to_tool = {a.name: t for t in tools for a in t.action_set}
110+
111+
@property
112+
def action_set(self) -> list[ActionSchema]:
113+
return [a for t in self.tools for a in t.action_set]
114+
115+
def execute_action(self, action: Action) -> Observation | StepError:
116+
return self._action_name_to_tool[action.name].execute_action(action)
117+
118+
async def async_execute_action(self, action: Action) -> Observation | StepError:
119+
return await self._action_name_to_tool[action.name].async_execute_action(action)
120+
```
121+
122+
### `_ToolActionsMixin.__init_subclass__` validation — relaxed
123+
124+
The previous `AsyncTool` subclass hook validated that every `@tool_action`
125+
method was async. With `Tool` accepting both kinds, that validation goes
126+
away. Authors who mix sync and async actions on one class are explicitly
127+
supported — dispatch routes per the method's own `def` keyword and bridges
128+
when caller and method differ.
129+
130+
### Invariants
131+
132+
1. Every `@tool_action`-decorated method is reachable through BOTH `Tool.execute_action` and `Tool.async_execute_action`, regardless of the method's sync/async kind.
133+
2. The fast paths (sync caller × sync action, async caller × async action) introduce no overhead beyond a normal Python method call.
134+
3. The bridge paths (sync caller × async action, async caller × sync action) introduce a thread hop:
135+
- sync → async: one-shot daemon thread + new event loop, ~2-5 ms per call.
136+
- async → sync: pooled worker via `asyncio.to_thread` — no per-call spawn cost, enables real OS-thread parallelism inside `asyncio.gather`.
137+
4. `action_set` lists all `@tool_action`-decorated methods irrespective of kind.
138+
5. Tools wrapping thread-affine resources MAY override `async_execute_action` to route through a per-instance single-threaded executor; the base class's default uses the global thread pool.
139+
140+
### Gotchas
141+
142+
- `AsyncBrowserTool` (the single in-tree `AsyncTool` subclass) flips to `Tool` with no body change. cube-harness `MonitoredTool` / `as_async` / agent isinstance checks migrate in a paired PR.
143+
- Tools that previously got an `__init_subclass__` error for mixed async/sync now pass silently. The previous structural mistake is now legal (mixing is supported); intentional misuse surfaces as a runtime bridge invocation that's slower than expected, not as an error.
144+
- A sync caller invoking a hot-path async action repeatedly pays ~2-5 ms × N for the bridge. For agents that care about per-call latency, switch to `Agent._arun` so calls go through `async_execute_action` directly.
145+
146+
---
147+
148+
## REMOVED
149+
150+
- `AbstractAsyncTool` — fold into `AbstractTool` (which now declares both `execute_action` and `async_execute_action`).
151+
- `AsyncTool` — fold into `Tool` (which now accepts sync OR async `@tool_action` methods on the same class).
152+
- `AsyncToolbox` — fold into `Toolbox` (whose `async_execute_action` is the canonical async call-site).
153+
154+
Deletion is atomic; no deprecation window. The only consumer of these symbols outside cube-standard is cube-harness, which migrates in a paired PR.
155+
156+
---
157+
158+
## ADDED — none beyond the dual surface above
159+
160+
`async_execute_action` already exists on `AbstractTool` (added by cube-standard
161+
#152). This change reuses it as the universal call-site and removes the
162+
class-level split that made the dual surface invisible.

0 commit comments

Comments
 (0)