Skip to content

feat(core): native tool calling as canonical action dispatch where supported#7435

Closed
0xSolace wants to merge 2 commits into
elizaOS:developfrom
0xSolace:feat/native-reasoning-runtime
Closed

feat(core): native tool calling as canonical action dispatch where supported#7435
0xSolace wants to merge 2 commits into
elizaOS:developfrom
0xSolace:feat/native-reasoning-runtime

Conversation

@0xSolace
Copy link
Copy Markdown
Collaborator

@0xSolace 0xSolace commented May 6, 2026

Summary

This reworks native reasoning from a customer-pickable character mode into framework-level action dispatch substrate.

Instead of asking character authors to set reasoning.mode, core now detects whether the configured model/provider can support native tool calling and routes accordingly:

  • capable models/providers go through @elizaos/native-reasoning
  • unsupported or legacy models continue through the existing prompt-XML bootstrap planner unchanged
  • character schema/types no longer expose a reasoning block

This aligns the PR with the cozy devs design discussion: native tool calling should be a framework capability selected from model support, not a customer-facing mode switch.

What changed

  • Removed reasoning.mode and reasoning.provider from the character schema.
  • Removed CharacterReasoningConfig and related character type fields.
  • Added isNativeToolCallingCapable(runtime) capability detection in core message dispatch.
  • Rewrote native dispatch to pass inferred provider/model metadata into the native loop.
  • Updated message service tests to cover capability-based routing and legacy completion fallback.
  • Updated native-reasoning docs/spec/package description to frame this as substrate, not opt-in alternate runtime.

Capability detection v1

The current implementation is intentionally conservative:

  • Anthropic Claude models route native.
  • OpenAI GPT-4+/GPT-5+/o-series/Codex-class model names are treated as native-tool-capable.
  • Codex backend selection routes native.
  • Legacy OpenAI completions models, such as text-davinci-*, remain on bootstrap.
  • Local providers remain on bootstrap until a concrete backend/capability is advertised rather than guessed from provider name alone.

Relationship to action modes

This is complementary to Shaw's incoming action-modes work, including Mode.ALWAYS_BEFORE, Mode.ALWAYS_AFTER, and Mode.DURING.

Once native dispatch exists, those modes can plug into the same tool registry and execution semantics instead of being compressed into the prompt planner.

Important scope note: this PR does not adopt actions-as-tools yet — that's the natural follow-up. this PR establishes the substrate; the next PR converts the action registry to emit native tool schemas.

Benchmarks

Empirical benchmarks are still required before treating this as a broad default replacement. A separate workstream should compare native dispatch against the current TOON/XML planner for token consumption, latency, tool/action selection accuracy, final response quality, and fallback/failure rates.

Validation

  • bun test packages/core/src/services/message.test.ts
  • bun run --cwd packages/native-reasoning test
  • bunx @biomejs/biome check packages/core/src/services/message.ts packages/core/src/services/message.test.ts packages/core/src/schemas/character.ts packages/core/src/types/agent.ts packages/native-reasoning/package.json

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8984c935-4dc6-4b3e-a844-5eae89ad805c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@0xSolace 0xSolace force-pushed the feat/native-reasoning-runtime branch from b4b7521 to d883c5a Compare May 6, 2026 09:48
@0xSolace 0xSolace changed the base branch from main to develop May 6, 2026 09:51
Co-authored-by: wakesync <shadow@shad0w.xyz>
@0xSolace 0xSolace force-pushed the feat/native-reasoning-runtime branch from d883c5a to 58ecab1 Compare May 6, 2026 09:53
@0xSolace
Copy link
Copy Markdown
Collaborator Author

0xSolace commented May 7, 2026

Reframe: not a customer-pickable mode, foundational substrate

Following discussion with @shawmakesmagic, this PR is being reframed.

The original reasoning.mode: 'bootstrap' | 'native' customer switch is wrong for elizaOS positioning. Customers shouldn't be picking cognitive architecture — the framework makes that decision and backs it with empirical results.

The valuable thing here isn't a parallel mode. It's the substrate: native tool calling as the canonical action dispatch mechanism, replacing prompt-based action-XML planning where the model supports it. Once that foundation exists, the work converges with what Shaw is building:

  • Mode.ALWAYS_BEFORE/ALWAYS_AFTER/ALWAYS_DURING actions plug into the same tool registry
  • Contexts (lazy provider/action loading by inferred intent) compose on top
  • TOON compression becomes redundant for tool-call models — the structured action call IS the structured representation
  • Smaller models (30B class) with strong tool calling get sharper action selection without scaffolding

What this PR will become

Reworking to:

  1. Remove the customer-facing reasoning.mode switch
  2. Detect native-tool-calling capability from the model provider, route accordingly internally
  3. Surface the native-reasoning loop as the path for tool-capable models, with the existing XML-planner path preserved as fallback for models without function calling
  4. Keep the PipelineHooks integration (it's already model-agnostic)
  5. Drop or hide the reasoning.provider config — derive from existing model provider

The split between this and Shaw's action-modes/contexts work becomes:

  • This PR = how actions get dispatched (native tool calling vs prompt parsing)
  • Action-modes = when actions fire in the lifecycle (BEFORE/AFTER/DURING)
  • Contexts = which actions/providers are loaded for a given intent

Composable, not competing. One framework decision, three orthogonal axes.

Empirical receipts

Shaw asked for benchmarks and token-consumption stats. Spinning up a fixed-prompt benchmark suite to compare:

  • Bootstrap (current main, action-XML + planner)
  • Native tool calling (this PR, reworked)

Across: tokens in/out, model calls per turn, latency, tool-selection accuracy, cost per successful turn. Will post results in #🪼-milady.

Production receipts (for context)

The native-reasoning loop has been running in Nyx (an eliza fork in production) for ~2 weeks. Shipped 3 parallel acpx subagent deploys to Cloudflare Workers in a single session as recent receipts. Implementation has been load-tested on real traffic. The pattern works; the surface is what we're getting right.

Status

Holding as draft until reworked per above. Marking status/proposal so it's clear this is a design vehicle, not a merge candidate as-filed.

Remove the character-level reasoning mode/provider knob and make the native reasoning dispatch path framework-selected from the configured model provider and model name. Update docs to frame native reasoning as substrate rather than a customer-pickable runtime.\n\nCo-authored-by: wakesync <shadow@shad0w.xyz>
@0xSolace 0xSolace changed the title feat(core): add @elizaos/native-reasoning as opt-in alternate runtime feat(core): native tool calling as canonical action dispatch where supported May 7, 2026
@0xSolace
Copy link
Copy Markdown
Collaborator Author

0xSolace commented May 7, 2026

Reworked this PR in 9fdbedb2 to remove the customer-facing reasoning.mode / reasoning.provider switch and make native reasoning framework-selected from model capability instead.

What changed:

  • character schema/types no longer expose a reasoning block
  • DefaultMessageService now uses isNativeToolCallingCapable(runtime) before routing to the native loop
  • legacy OpenAI completions stay on the bootstrap planner
  • native-reasoning docs/spec now frame this as substrate for canonical tool dispatch, not an opt-in alternate runtime
  • action-modes compatibility is called out in the PR body, with actions-as-tools scoped as the next PR

Validation:

  • bun test packages/core/src/services/message.test.ts passed, 8 pass / 1 skip
  • bun run --cwd packages/native-reasoning test passed, 9 files / 111 tests
  • bunx @biomejs/biome check packages/core/src/services/message.ts packages/core/src/services/message.test.ts packages/core/src/schemas/character.ts packages/core/src/types/agent.ts packages/native-reasoning/package.json passed

Note: native-reasoning tests initially failed because @anthropic-ai/sdk was not linked in the local workspace. After bun install refreshed workspace symlinks, the same test command passed.

@0xSolace
Copy link
Copy Markdown
Collaborator Author

0xSolace commented May 7, 2026

Closing this PR — superseded by Wave 1

@shawmakesmagic shipped the V5 native-tool-calling architecture in commits 0e8487ab07, 7691ba4d6d, fb34e96e48, 8bc9242c47 ("wave-1 test stack overhaul" + follow-ups). What's there does substantially more than this PR proposed and is the right shape:

  • actions/to-tool.ts (actionToTool, actionToJsonSchema, strict tool definitions with ^[A-Z_][A-Z0-9_]*$ naming) — actions become native function-calling tools at the framework level
  • runtime/context-registry.ts + runtime/default-contexts.ts — 26 first-party contexts as a frozen taxonomy (general/memory/knowledge/web/code/email/calendar/wallet/...) with role gates, sensitivity tiers, cache scopes, byte-identical registration for prompt-cache stability
  • runtime/planner-loop.ts + runtime/sub-planner.ts + runtime/execute-planned-tool-call.ts — multi-step planner with sub-actions
  • services/message.ts runV5MessageRuntimeStage1 — single Stage 1 call returns ignored | stopped | final_reply | planning_needed{contexts}; replaces the 3-call shouldRespond → action-pick → content-gen pipeline, runs in parallel with existing pipeline hooks for graceful coexistence
  • Cloud-side: warm pool service + 4 cron routes + migration 0107_warm_pool_columns.sql, native tool pass-through in cloud/apps/api/v1/chat/completions/route.ts (__nativeToolingTestHooks)
  • Cerebras + expanded mockoon test stack

This PR's @elizaos/native-reasoning package was substantially redundant with Wave 1, at lower abstraction (no contexts, no sub-planner, no warm-pool integration, no role-gated context filtering, no cache-stable context hash).

Salvageable piece: the CodexBackend (chatgpt-prolite OAuth-tokens-from-~/.codex/auth.json) is genuinely useful IP not duplicated upstream. Will propose that as a smaller standalone follow-up, scoped to a model provider plugin rather than a full reasoning runtime.

Closing as superseded. Thanks for the redirect.

— sol (acting on behalf of @0xSolace_)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant