feat(orchestrator): collapse per-integration delegation into one tool (#1335)#1488
feat(orchestrator): collapse per-integration delegation into one tool (#1335)#1488obchain wants to merge 8 commits into
Conversation
Replaces the N synthesised `delegate_<toolkit>` tools with a single `delegate_to_integrations_agent` that takes the toolkit slug as a required argument. The list of connected toolkit slugs is rendered inline in the tool description and exposed as a JSON schema enum on the `toolkit` parameter, so the orchestrator's function-calling schema cost is now constant in the integration dimension while the orchestrator still discovers which integrations are routable. `SkillDelegationTool::for_connected` returns `None` when no integrations are connected so the tool is omitted entirely (the orchestrator can't usefully route to an empty set), preserving the existing "no phantom tools" behaviour. Server-side toolkit validation re-canonicalises the LLM-provided slug via `sanitise_slug` before comparing it against the connected set, mirroring how slugs are produced when the description is built so quirky toolkit names (mixed case, dashes, dots) round-trip predictably. Refs tinyhumansai#1335.
Updates the orchestrator's `prompt.md` decision tree and `prompt.rs` delegation guide so they reference `delegate_to_integrations_agent` with the `toolkit` argument instead of the obsolete per-toolkit tool names. Trims the duplicated one-line "when delegating" recap that restated the decision tree, leaving the decision tree as the single source of routing rules. Tests in `prompt.rs` are updated to assert on the new format and reject the old `delegate_<toolkit>` / `spawn_subagent(...)` strings. Closes tinyhumansai#1335.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughCollapse per-integration delegation into a single ChangesCollapsed Delegation Tool Architecture
Sequence Diagram(s)sequenceDiagram
participant Orchestrator as Orchestrator Prompt/Model
participant DelegateTool as delegate_to_integrations_agent
participant IntegrationsAgent as integrations_agent
Orchestrator->>DelegateTool: call(toolkit="gmail", prompt=...)
DelegateTool->>IntegrationsAgent: dispatch_subagent(integrations_agent, toolkit="gmail", prompt)
IntegrationsAgent-->>DelegateTool: result
DelegateTool-->>Orchestrator: result (surface inline)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/openhuman/tools/impl/agent/skill_delegation.rs (1)
125-170: ⚡ Quick winAdd routing breadcrumbs on validation failures and dispatch.
This new delegation entrypoint has several hard-to-infer branches (
missing toolkit,unknown toolkit,empty prompt, dispatch tointegrations_agent) but currently emits no logs, which will make routing regressions much harder to diagnose from production traces. A few[integrations_delegate]debug/trace lines on entry, rejection, and successful dispatch would make this path much easier to support. As per coding guidelines "Add copious debug/trace logs while implementing features (not as an afterthought); use log/tracing crates at debug/trace level on RPC entry/exit, error paths, state transitions, and hard-to-infer branches".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tools/impl/agent/skill_delegation.rs` around lines 125 - 170, In the execute method of skill_delegation.rs add trace/debug logs tagged like "[integrations_delegate]" at entry (log tool_name and raw_toolkit/raw prompt values), on each validation rejection (missing toolkit, unknown toolkit — include raw_toolkit, allowed list, and sanitised slug — and empty prompt), and around the dispatch_subagent call (log before dispatch with target "integrations_agent", tool_name, prompt and slug, and log the result or error after awaiting super::dispatch_subagent); use the project's tracing/log crate at trace or debug level and include the actual error/result details in the log messages to aid routing diagnostics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/tools/impl/agent/skill_delegation.rs`:
- Around line 268-289: Update the test
execute_normalises_toolkit_input_before_matching to actually call tool.execute
with toolkit "Google-Calendar" (via the existing tool.execute(json!({"toolkit":
"Google-Calendar", "prompt": "x"})).await.unwrap()) instead of only asserting
sanitise_slug("Google-Calendar"); assert that the execution reaches past the
unknown-toolkit validation by checking the returned error is not the
unknown-toolkit case (e.g., ensure the error result does not include the
"unknown-toolkit" indicator or message) and keep the existing setup using
SkillDelegationTool::for_connected so the toolkit gate should pass; remove or
replace the direct sanitise_slug assertion since the test must exercise the
execute() path that uses sanitise_slug internally.
In `@src/openhuman/tools/orchestrator_tools.rs`:
- Around line 136-156: The current mapping over connected_integrations uses
sanitise_slug(&integration.toolkit) and drops the original toolkit identifier,
which can collapse distinct integrations (e.g., Slack.Bot vs Slack-Bot) into the
same slug; update the transformation in the block building connected:
Vec<(String, String)> to detect slug collisions (by building a temporary map or
set keyed by slug) and fail closed (return an Err or log+panic) when a duplicate
slug is produced, or instead preserve a stable upstream id alongside the slug
(e.g., produce Vec<(String /*slug*/, String /*description*/, String
/*raw_toolkit_id*/)> or pair slug with integration.toolkit) so downstream uses
(skill_filter) can disambiguate; locate the transform around
connected_integrations, sanitise_slug, and where connected is consumed
(skill_filter) to apply the change.
---
Nitpick comments:
In `@src/openhuman/tools/impl/agent/skill_delegation.rs`:
- Around line 125-170: In the execute method of skill_delegation.rs add
trace/debug logs tagged like "[integrations_delegate]" at entry (log tool_name
and raw_toolkit/raw prompt values), on each validation rejection (missing
toolkit, unknown toolkit — include raw_toolkit, allowed list, and sanitised slug
— and empty prompt), and around the dispatch_subagent call (log before dispatch
with target "integrations_agent", tool_name, prompt and slug, and log the result
or error after awaiting super::dispatch_subagent); use the project's tracing/log
crate at trace or debug level and include the actual error/result details in the
log messages to aid routing diagnostics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 78f4f128-387b-4b40-ad2b-36b61657b063
📒 Files selected for processing (4)
src/openhuman/agent/agents/orchestrator/prompt.mdsrc/openhuman/agent/agents/orchestrator/prompt.rssrc/openhuman/tools/impl/agent/skill_delegation.rssrc/openhuman/tools/orchestrator_tools.rs
…rmaliser Addresses CodeRabbit review on tinyhumansai#1488. `sanitise_slug` is lossy — `Slack.Bot` and `Slack-Bot` both collapse to `slack_bot`, so the previous mapping silently shadowed one integration when discarding the raw id. `collect_orchestrator_tools` now walks the connected list with a slug-collision set, drops every duplicate after the first, and warns via `log::warn` so operators can spot the shadowed toolkit. A new `duplicate_sanitised_slug_drops_later_collisions` test locks the behaviour. `execute_normalises_toolkit_input_before_matching` previously only asserted on `sanitise_slug` directly, so a regression to raw `==` inside `execute` would have stayed green. The test now drives `execute("GMail", ...)` to confirm the unknown-toolkit branch fires and `execute("Google-Calendar", ...)` to confirm the toolkit gate is bypassed when the normalised slug matches a connected entry.
graycyrus
left a comment
There was a problem hiding this comment.
Review — collapse per-integration delegation into one tool
Overall this is a clean, well-tested structural improvement. The O(1) invariant is the right fix, slug collision handling is solid, and the test suite is comprehensive. A few things to tighten up:
3 findings (1 major, 1 major, 1 minor) — details in inline comments below.
Verified / looks good
for_connectedreturnsNoneon empty input ✓- Slug collision detection with
HashSet— first arrival wins ✓ sanitise_slugshared between tool-building and prompt rendering — cannot drift ✓- Error messages include raw LLM input for debuggability ✓
- O(1) invariant test across 1/3/7/20 integrations ✓
- No bare
.unwrap()in production code ✓ - No PII/secrets in logs ✓
- No new standalone files at
src/openhuman/root ✓
Six doc sites still described the pre-tinyhumansai#1335 world where the orchestrator synthesised one `delegate_{toolkit}` tool per connected Composio integration. Update each to describe the collapsed `delegate_to_integrations_agent` tool whose `toolkit` argument selects the route, so future readers grepping for delegation surface land on the current behaviour.
…tions
`build_description` in `SkillDelegationTool` renders ` - <slug>` lines
with no label when the upstream `ConnectedIntegration::description` is
blank, leaving the orchestrator LLM with no signal about what the
toolkit can do for newly-onboarded or under-populated integrations.
Match the phrasing the old per-toolkit fan-out used in that case
("External integration via <raw toolkit name>…") before pushing the
slug into the collapsed tool's enum, and lock the behaviour with
`empty_integration_description_falls_back_to_generic_label`.
`SkillDelegationTool::execute` had three rejection branches and a dispatch call with no `log::debug!` lines, so the only way to diagnose a routing failure was to add prints. Per the project's verbose- diagnostics rule for new/changed flows, emit one entry log with the raw toolkit and prompt size, one log per rejection branch (missing toolkit, unknown toolkit, empty prompt), and one before the `integrations_agent` dispatch — all prefixed `[skill-delegation]` so a single grep traces the path.
|
|
||
| /// Canonical tool name surfaced to the orchestrator LLM. | ||
| pub const INTEGRATIONS_DELEGATE_TOOL_NAME: &str = "delegate_to_integrations_agent"; | ||
|
|
There was a problem hiding this comment.
[major] Frontend timeline formatting degrades with the collapsed tool name.
app/src/utils/toolTimelineFormatting.ts:86 extracts the toolkit name from the tool name via entry.name.match(/^delegate_(.+)$/). With the old delegate_gmail, this correctly extracted gmail. With the new delegate_to_integrations_agent, it extracts to_integrations_agent — which does not match KNOWN_TOOLKIT_RE (L80-81), so the UI falls back to humanizeIdentifier("delegate_to_integrations_agent") = "To Integrations Agent" instead of showing the actual toolkit name like "Gmail".
The code at L55 already reads parsedArgs?.toolkit as a fallback path, but the regex match at L86 takes priority. The fix is to check parsedArgs?.toolkit before falling back to regex extraction from the tool name.
Similarly, app/src/utils/__tests__/toolTimelineFormatting.test.ts has test stubs using delegate_notion — these should add a case for the collapsed tool with a toolkit arg.
app/src/providers/ChatRuntimeProvider.tsx:302 uses startsWith('delegate_') which still matches, so that's fine.
This is not a runtime crash but a visible UX regression — users would see "To Integrations Agent" instead of the toolkit name in the tool timeline.
There was a problem hiding this comment.
Fixed in 0869b87. Toolkit arg path stays primary; unknown slugs (e.g. slack_bot) humanise to Slack Bot; missing toolkit falls back to Checking your connected app to match the spawn_subagent integrations_agent copy. New Vitest cases cover all three branches.
The collapsed delegation tool no longer carries the toolkit in its
name, so `humanizeIdentifier('delegate_to_integrations_agent')`
surfaced "To Integrations Agent" in the chat timeline whenever the
toolkit arg was missing or outside KNOWN_TOOLKIT_RE.
Use the existing toolkit arg first, fall back to a humanised toolkit
slug for unknown integrations (e.g. `slack_bot` → "Slack Bot"), and
finally render "Checking your connected app" — matching the
spawn_subagent → integrations_agent generic copy — instead of
humanising the tool name.
Covers three new Vitest cases (known toolkit, unknown toolkit, no
toolkit) and leaves the existing delegate_<toolkit> behaviour intact.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/utils/__tests__/toolTimelineFormatting.test.ts`:
- Line 91: Prettier formatting is failing in
app/src/utils/__tests__/toolTimelineFormatting.test.ts for the test case
it('formats delegate_to_integrations_agent without a toolkit arg as a generic
connected-app label', ...); fix by running Prettier --write on that file (or
your repo's formatter) to apply the project's formatting rules and re-run tests;
ensure the test block (the it(...) declaration and surrounding file) is
reformatted to match the project's Prettier config so CI passes.
In `@app/src/utils/toolTimelineFormatting.ts`:
- Around line 59-79: Prettier formatting is failing for the block that sets
title (references: provider, entry.name === 'delegate_to_integrations_agent',
parsedArgs?.toolkit, integrationActivityTitle, humanizeIdentifier); run Prettier
on this file and reformat the function/block containing that title assignment so
it matches project style rules (fix spacing/line breaks, trailing commas, and
indentation) and then commit the formatted file so CI Prettier check passes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ec2544b9-7e1d-4c46-b7b1-950eb72e01ca
📒 Files selected for processing (2)
app/src/utils/__tests__/toolTimelineFormatting.test.tsapp/src/utils/toolTimelineFormatting.ts
Summary
delegate_<toolkit>fan-out into a singledelegate_to_integrations_agenttool that takes the toolkit slug as a required argument, so the orchestrator's function-calling schema cost is now constant in the integration dimension.toolkitJSON-schema enum so the orchestrator still discovers what's routable without each toolkit being its own schema entry.prompt.mddecision tree andprompt.rsdelegation guide to point at the new tool, and trims the duplicated routing recap so the decision tree is the single source of routing rules.Problem / Context
The orchestrator is the front-line agent, so its tool schema and prompt size directly affect TTFT. Before this PR:
src/openhuman/tools/orchestrator_tools.rs::collect_orchestrator_toolsexpandedSubagentEntry::Skills("*")into one synthesisedSkillDelegationToolper connected Composio integration (gmail,github,notion, …). Each carried its own description string and its own JSON-schema entry, even though every one of them dispatched to the sameintegrations_agentwith a differentskill_filter.src/openhuman/agent/agents/orchestrator/prompt.mdandprompt.rsboth referenced the per-toolkit tool names, so the routing guidance drifted as toolkits were added.The structural problem was the fan-out: visible tool count grew linearly with the number of connected integrations even though the orchestrator only needed the routing handle ("send this to integrations, scoped to toolkit X").
Solution
1. Collapsed delegation tool —
src/openhuman/tools/impl/agent/skill_delegation.rsSkillDelegationTool::for_connected(Vec<(slug, description)>)returnsSome(tool)when at least one toolkit is connected,Noneotherwise. The tool:delegate_to_integrations_agent(exposed asINTEGRATIONS_DELEGATE_TOOL_NAME).toolkitparameter listing every connected slug, so model providers that honourenumwill reject calls to unconnected toolkits at the schema layer.sanitise_slugbefore comparing it against the connected set, so quirky names (mixed case, dashes, dots) round-trip predictably.super::dispatch_subagent("integrations_agent", ..., Some(&slug))— the same path the previous per-toolkit tool used.2. Constant tool count —
src/openhuman/tools/orchestrator_tools.rsThe
SubagentEntry::Skills(*)branch now builds a single(slug, description)vector from the connected integrations and emits at most one collapsed tool. AgentId entries (delegate_researcher,delegate_archivist, etc.) and direct tools inagent.tomlare untouched.3. Prompt updates —
src/openhuman/agent/agents/orchestrator/{prompt.md,prompt.rs}delegate_to_integrations_agentwithtoolkitset to the relevant slug.prompt.md(line 34) was trimmed; the decision tree remains the single source of routing rules.render_delegation_guideemits the compact- **gmail** (\toolkit: "gmail"`): …` format so the prompt reflects how the new tool is actually invoked.Measurement
Captured via
bash scripts/debug-agent-prompts.sh --out <dir>onupstream/mainvs this branch, with the test user's currently-signed-in workspace.visible_tool_countCaveat (called out for transparency): the measurement workspace has zero connected Composio integrations, so the structural fan-out savings don't materialise in this dump. With N>0 connected integrations the win is
N - 1fewer tool-schema entries (each ~300 bytes of description), independent of N. The structural invariant is locked by the new unit testcollapsed_delegation_tool_count_is_constant_across_integration_counts, which exercises 1, 3, 7, and 20 integrations and asserts the collapsed tool count stays at 1.Submission Checklist
SkillDelegationTool(none-when-empty, canonical name, description enumeration, schema enum, normalisation, missing/unknown toolkit, empty prompt), 8 cases forcollect_orchestrator_tools(AgentId + wildcard collapse, constant-count invariant, no-integration path, unknown subagent, empty subagents, sanitised slugs, unconnected omission), 7 prompt rendering cases.pnpm test:rustto verify.## Relateddocs/RELEASE-MANUAL-SMOKE.md) — N/A: orchestrator tool surface, no release-cut behaviour change.Closes #NNNin the## Relatedsection.Impact
O(1)in connected integrations instead ofO(N). With the issue's example of 22 named direct tools + N integrations +Knamed archetypes, the visible-tool count now sits at22 + K + 1regardless ofN.toolkit: "gmail"rather than callingdelegate_gmail. Downstreamintegrations_agentcontinues to receive the toolkit viaskill_filter, so the integrations agent's view is unchanged.Related
src/openhuman/tools/orchestrator_tools.rs,src/openhuman/tools/impl/agent/skill_delegation.rs,src/openhuman/agent/agents/orchestrator/).dispatch.rsbudget cap was explicitly deferred per the issue's "Non-goals" — not introduced preemptively.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Summary by CodeRabbit
Refactor
Bug Fixes
User-visible changes
Tests
Documentation