-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Improve capabilities docs #4832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
e4a2b63
2df76aa
b6edd2e
ee9002c
306802c
d6f1367
0f00d20
6256d9e
a2772d2
37f63f6
583f6d0
0374cd0
7197146
4321f68
72d00b0
d9f6be5
c32d8f6
397e205
32baede
d1b8192
6b12501
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| # Capabilities Docs — Uncovered User Questions | ||
|
|
||
| Questions users are likely to have that the current `docs/capabilities.md` doesn't cover (or barely touches). | ||
|
|
||
| ## 1. What's the generic type parameter on `AbstractCapability[T]`? | ||
|
|
||
| Every example uses `AbstractCapability[Any]`. Users will wonder: what is `T`? How does it relate to the agent's dependency type? Can a capability declare that it needs specific dependencies, and what happens if the agent's dep type doesn't match? | ||
|
||
|
|
||
| ## 2. Can capabilities be added per-run? | ||
|
|
||
| The docs only show capabilities at `Agent(...)` construction time. Users will want to know: can I pass extra capabilities to `agent.run()` or `agent.iter()`? Or override/disable specific ones for a single run? | ||
|
||
|
|
||
| ## 3. `BuiltinTool`, `Toolset`, and `HistoryProcessor` — how to use them? | ||
|
|
||
| These three are listed in the built-in table but have zero documentation beyond the one-line row. A user reading top-to-bottom will see `PrepareTools` and `PrefixTools` get dedicated sections but these don't. | ||
|
||
|
|
||
| ## 4. `ImageGeneration` local fallback | ||
|
|
||
| `WebSearch` auto-detects DuckDuckGo, `MCP` auto-detects transport — but what does `ImageGeneration(local=...)` expect? There's no example of providing a local image generation fallback. | ||
|
||
|
|
||
| ## 5. When to use a capability vs direct agent parameters | ||
|
|
||
| The intro paragraph hints at this ('instructions and model settings are configured directly...'), but there's no decision framework. When should I use `Agent(instructions=...)` vs a capability that provides instructions? When should I register tools with `@agent.tool` vs bundling them in a capability? | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 6. Ordering guidance for composition | ||
|
|
||
| The composition section explains *mechanics* (before fires in order, after in reverse, wrap nests) but gives no *practical guidance*. If I have a guardrail and a logger, which should come first? What's the mental model for deciding order? | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 7. How does `for_run` interact with `get_toolset`? | ||
|
|
||
| `get_toolset` is called at agent construction, `for_run` is called per-run. If my capability returns a toolset *and* overrides `for_run`, does the toolset come from the original instance or the fresh one? This matters a lot for stateful toolsets. | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 8. Concurrency / thread safety | ||
|
|
||
| If the same agent is used for concurrent runs (common in web servers), what happens to shared capability instances? `for_run` helps, but the docs don't mention concurrency at all. Users need to know if `for_run` is required or just recommended for concurrent usage. | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 9. Can capabilities interact with each other? | ||
|
|
||
| No mention of inter-capability communication. If capability A needs to read state from capability B (e.g., a cost tracker that a budget limiter reads), how do they coordinate? Through dependencies? Through some other mechanism? | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 10. Skip exceptions and hook chain behavior | ||
|
|
||
| `SkipModelRequest`, `SkipToolValidation`, and `SkipToolExecution` are mentioned one sentence each. Users will want to know: when a skip is raised from `before_*`, do other capabilities' `before_*` hooks still fire? Do `after_*` hooks fire with the skip result? What about `wrap_*`? | ||
|
||
|
|
||
| ## 11. Error hook ordering with multiple capabilities | ||
|
|
||
| When multiple capabilities define `on_*_error`, which fires first? If one recovers (returns a result), do the others still see the error? Or does recovery short-circuit? | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 12. Message history interaction | ||
|
|
||
| If `message_history` is passed to `agent.run()`, do `before_model_request` hooks see the historical messages in `request_context.messages`? Can they modify them? | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 13. Testing custom capabilities | ||
|
|
||
| No guidance on testing. Users building non-trivial capabilities (guardrails, approval workflows) will want patterns for: testing that hooks fire in the right order, testing error recovery, testing with `TestModel`, etc. | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 14. Complete lifecycle flow diagram | ||
|
|
||
| The error hook section has a small `before -> wrap -> after/on_error` ASCII diagram, but there's no full picture showing how all the hooks interact across a complete run (run -> node -> model request -> tool validate -> tool execute), especially with multiple capabilities composed. | ||
|
|
||
| ## Assessment | ||
|
|
||
| Most of these are 'what happens when I combine X with Y' questions — the docs do a good job explaining each feature in isolation but leave composition, edge cases, and the full lifecycle under-documented. The biggest gaps are **#3** (undocumented builtins), **#1** (the type parameter), and **#5** (decision framework for when to use capabilities vs direct params). | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # Capabilities — Niche Caveats Worth Documenting | ||
|
|
||
| Brief notes on edge-case behaviors discovered during bug hunting (PR #4640). | ||
| These don't warrant dedicated doc sections but should appear as caveats/notes | ||
| near the relevant feature documentation once the bugs are fixed. | ||
|
|
||
| ## 1. DynamicToolset factory errors leave ambiguous state | ||
|
|
||
| **Where to document:** `docs/toolsets.md`, near the dynamic toolset section. | ||
|
|
||
| **Caveat:** If a `per_run_step=True` factory raises mid-run, the DynamicToolset | ||
| doesn't roll back — the previous toolset stays referenced but the overall state | ||
| is ambiguous. Worse: if the factory succeeds but the new toolset's `__aenter__` | ||
| fails, the old toolset has already been exited and the new one never entered. | ||
| `self._toolset` then points to an un-initialized toolset. | ||
|
|
||
| **User guidance:** Factory functions used with `per_run_step=True` should be | ||
| defensive — catch and handle their own errors rather than letting exceptions | ||
| propagate into the toolset lifecycle. If the factory can fail, consider | ||
| returning the previous toolset as a fallback instead of raising. | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 2. Tool retry counts persist across DynamicToolset tool swaps | ||
|
|
||
| **Where to document:** `docs/toolsets.md`, near dynamic toolsets or tool retries. | ||
|
|
||
| **Caveat:** `ToolManager` tracks tool retries by **name** across run steps. If a | ||
| `DynamicToolset` with `per_run_step=True` replaces a tool's implementation | ||
| between steps but keeps the same name, the new implementation inherits the | ||
| accumulated retry count from the old one. A fresh tool can be born with N | ||
| retries already counted against it, hitting `max_retries` sooner than expected. | ||
|
|
||
| **User guidance:** If tool implementations change between steps, use distinct | ||
| tool names to avoid inheriting retry state. Alternatively, design tools with | ||
| retry budgets that account for the possibility of inherited counts. | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 3. History processor composition doesn't validate message consistency | ||
|
|
||
| **Where to document:** `docs/message-history.md`, near the history processors section. | ||
|
|
||
| **Caveat:** When multiple history processors are registered, they compose in | ||
| sequence — processor 2 sees processor 1's output. There's no validation that | ||
| the resulting messages are semantically consistent. A processor that removes | ||
| `ModelResponse` messages containing `ToolCallPart` but leaves the subsequent | ||
| `ModelRequest` with `ToolReturnPart` creates orphaned tool returns — the model | ||
| sees a tool result with no preceding tool call. | ||
|
|
||
| **User guidance:** When writing history processors that remove messages, ensure | ||
| tool calls and their corresponding tool returns are removed together. A safe | ||
dsfaccini marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| pattern is to track tool call IDs and remove both the call and return as a pair. | ||
dsfaccini marked this conversation as resolved.
Show resolved
Hide resolved
|
Uh oh!
There was an error while loading. Please reload this page.