feat(core): native tool calling as canonical action dispatch where supported#7435
feat(core): native tool calling as canonical action dispatch where supported#74350xSolace wants to merge 2 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
b4b7521 to
d883c5a
Compare
Co-authored-by: wakesync <shadow@shad0w.xyz>
d883c5a to
58ecab1
Compare
Reframe: not a customer-pickable mode, foundational substrateFollowing discussion with @shawmakesmagic, this PR is being reframed. The original The valuable thing here isn't a parallel mode. It's the substrate: native tool calling as the canonical action dispatch mechanism, replacing prompt-based action-XML planning where the model supports it. Once that foundation exists, the work converges with what Shaw is building:
What this PR will becomeReworking to:
The split between this and Shaw's action-modes/contexts work becomes:
Composable, not competing. One framework decision, three orthogonal axes. Empirical receiptsShaw asked for benchmarks and token-consumption stats. Spinning up a fixed-prompt benchmark suite to compare:
Across: tokens in/out, model calls per turn, latency, tool-selection accuracy, cost per successful turn. Will post results in Production receipts (for context)The native-reasoning loop has been running in Nyx (an eliza fork in production) for ~2 weeks. Shipped 3 parallel acpx subagent deploys to Cloudflare Workers in a single session as recent receipts. Implementation has been load-tested on real traffic. The pattern works; the surface is what we're getting right. StatusHolding as draft until reworked per above. Marking |
Remove the character-level reasoning mode/provider knob and make the native reasoning dispatch path framework-selected from the configured model provider and model name. Update docs to frame native reasoning as substrate rather than a customer-pickable runtime.\n\nCo-authored-by: wakesync <shadow@shad0w.xyz>
|
Reworked this PR in What changed:
Validation:
Note: native-reasoning tests initially failed because |
Closing this PR — superseded by Wave 1
This PR's Salvageable piece: the Closing as superseded. Thanks for the redirect. — sol (acting on behalf of @0xSolace_) |
Summary
This reworks native reasoning from a customer-pickable character mode into framework-level action dispatch substrate.
Instead of asking character authors to set
reasoning.mode, core now detects whether the configured model/provider can support native tool calling and routes accordingly:@elizaos/native-reasoningreasoningblockThis aligns the PR with the cozy devs design discussion: native tool calling should be a framework capability selected from model support, not a customer-facing mode switch.
What changed
reasoning.modeandreasoning.providerfrom the character schema.CharacterReasoningConfigand related character type fields.isNativeToolCallingCapable(runtime)capability detection in core message dispatch.Capability detection v1
The current implementation is intentionally conservative:
text-davinci-*, remain on bootstrap.Relationship to action modes
This is complementary to Shaw's incoming action-modes work, including
Mode.ALWAYS_BEFORE,Mode.ALWAYS_AFTER, andMode.DURING.Once native dispatch exists, those modes can plug into the same tool registry and execution semantics instead of being compressed into the prompt planner.
Important scope note: this PR does not adopt actions-as-tools yet — that's the natural follow-up. this PR establishes the substrate; the next PR converts the action registry to emit native tool schemas.
Benchmarks
Empirical benchmarks are still required before treating this as a broad default replacement. A separate workstream should compare native dispatch against the current TOON/XML planner for token consumption, latency, tool/action selection accuracy, final response quality, and fallback/failure rates.
Validation
bun test packages/core/src/services/message.test.tsbun run --cwd packages/native-reasoning testbunx @biomejs/biome check packages/core/src/services/message.ts packages/core/src/services/message.test.ts packages/core/src/schemas/character.ts packages/core/src/types/agent.ts packages/native-reasoning/package.json