fix(benchmark): fold model base-URL env into bench-server runtime settings (#10199) by lalalune · Pull Request #10676 · elizaOS/eliza

lalalune · 2026-07-01T06:16:16Z

Relates to #10199 (agent-driven adapter matrix).

The gap

The benchmark server (packages/app-core/src/benchmark/server.ts) folds provider API keys into the runtime's character.settings.secrets so plugins that call runtime.getSetting() can read them — but it did not fold the base-URL overrides. core getSetting() never reads process.env (multi-tenant isolation), and plugin-openai's getBaseURL() resolves the endpoint via getSetting("OPENAI_BASE_URL")/getSetting("CEREBRAS_BASE_URL"). So a run pointed at an OpenAI-compatible endpoint (Cerebras/OpenRouter/…) folded the key but not the URL.

Fix + verification (VERIFIED)

Fold OPENAI_BASE_URL/CEREBRAS_BASE_URL/OPENROUTER_BASE_URL/GROQ_BASE_URL alongside the keys (purely additive — only applies when set).

Verified live (gpt-oss-120b/Cerebras): probed the running agent after this change and runtime.getSetting("OPENAI_BASE_URL") now returns https://api.cerebras.ai/v1 (previously unset → getBaseURL() fell back to api.openai.com). So the setting the openai plugin needs is now present.

Scope note (honest)

This fix does what it claims — makes the base URL visible to getSetting(). In my live run the agent-driven benchmark still hit a 401 because the running plugin-openai (dist) did not consume the now-available setting for the actual model call (its getBaseURL() still resolved api.openai.com despite getSetting returning Cerebras). That is a separate plugin-openai issue (filed to #10199 for follow-up), not this server change. This server fold is a necessary prerequisite regardless. Cerebras /v1/models and /v1/chat/completions both return 200 with the key, so the endpoint/key are valid.

🤖 Generated with Claude Code

…tings (#10199) The benchmark server folds provider API keys (OPENAI_API_KEY, CEREBRAS_API_KEY, …) into the runtime settings map so plugins that call runtime.getSetting() can authenticate — but it did NOT fold the base-URL overrides. core getSetting() deliberately never reads process.env (multi-tenant isolation), and plugin-openai's getBaseURL() resolves the endpoint via getSetting("OPENAI_BASE_URL") / getSetting("CEREBRAS_BASE_URL"). So a run pointed at an OpenAI-compatible endpoint (Cerebras/OpenRouter/…) via OPENAI_BASE_URL folds the KEY but not the URL: the agent's conversational model calls fall back to https://api.openai.com/v1, the provided key auth-fails, and every agent-driven benchmark scores 0 on the 'agent reply' path (while direct-call benchmarks, which read process.env, pass). Diagnosed live on gpt-oss-120b/Cerebras: the failing turns hit the authFailed branch in core/services/message.ts and emit the 'Eliza Cloud key isn't authorized' reply. Fold OPENAI_BASE_URL / CEREBRAS_BASE_URL / OPENROUTER_BASE_URL / GROQ_BASE_URL alongside the API keys. Purely additive — only takes effect when the var is set. Verification note: end-to-end confirmation on the agent-driven benchmarks was blocked locally because bench-server processes resisted termination and a pre-fix server was reused across runs; the getSetting/base-URL gap is unambiguous from the code paths above. Needs a clean-env/CI confirm on the agent-driven adapter matrix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

greptile-apps

Your trial has ended. Reactivate Greptile to resume code reviews.

coderabbitai · 2026-07-01T06:16:24Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f791f2cd-dc97-46fb-bbf3-8d18b3a1c0d0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/10199-bench-server-fold-base-url

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

greptile-apps

Your trial has ended. Reactivate Greptile to resume code reviews.

lalalune · 2026-07-01T06:25:50Z

Clean verification pass completed on the PR branch after building the core/plugin lane.

Added evidence commit: b04ca46b53b (.github/issue-evidence/10199-bench-base-url-pr10676.md).

Validation run:

bun run build:core passed: 64 successful tasks.
bun test --coverage-reporter=lcov packages/app-core/src/benchmark/__tests__/cerebras-endpoint.test.ts passed: 11 tests, 34 assertions.
Live agent-driven context_bench smoke with Cerebras/gpt-oss-120b passed through the eliza TS bridge:
- run group rg_20260701T062348Z_e920e81a
- run id run_context_bench_20260701T062348Z_1_26d66083
- status succeeded
- score 1.0

Manual artifact review:

latest/context_bench__eliza.json reports provider/model cerebras / gpt-oss-120b, overall_accuracy=1.0, and token telemetry.
telemetry.jsonl contains a real RESPONSE_HANDLER trajectory; the model answered the generated password retrieval task correctly.
Server stdout shows Cerebras autowiring and registered @elizaos/plugin-openai handlers for RESPONSE_HANDLER and ACTION_PLANNER.
Server stderr does not contain the previous auth-fallback text and does not contain OpenAI plugin not available.

One setup note: an initial run before build:core was invalid because the disposable worktree had dependencies but no built @elizaos/plugin-openai dist; after the build, the same live smoke passed cleanly. This PR now has the clean confirmation requested in the original PR body.

greptile-apps Bot reviewed Jul 1, 2026

View reviewed changes

lalalune mentioned this pull request Jul 1, 2026

benchmarks: full gpt-oss-120b rerun + HITL multi-Codex harness for Hermes/OpenClaw/elizaOS/Smithers #10199

Open

docs(benchmarks): add base-url verification evidence

b04ca46

greptile-apps Bot reviewed Jul 1, 2026

View reviewed changes

lalalune merged commit 10bb61b into develop Jul 1, 2026
11 of 57 checks passed

lalalune deleted the fix/10199-bench-server-fold-base-url branch July 1, 2026 06:25

NubsCarson mentioned this pull request Jul 1, 2026

feat(plugin-pty): PTY_SERVICE — real interactive eliza-code CLI (on cerebras) in the cockpit terminal #10668

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(benchmark): fold model base-URL env into bench-server runtime settings (#10199)#10676

fix(benchmark): fold model base-URL env into bench-server runtime settings (#10199)#10676
lalalune merged 2 commits into
developfrom
fix/10199-bench-server-fold-base-url

lalalune commented Jul 1, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Review skipped

Uh oh!

greptile-apps Bot left a comment

Uh oh!

lalalune commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lalalune commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The gap

Fix + verification (VERIFIED)

Scope note (honest)

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

lalalune commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lalalune commented Jul 1, 2026 •

edited

Loading

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading