Skip to content

fix(benchmark): fold model base-URL env into bench-server runtime settings (#10199)#10676

Merged
lalalune merged 2 commits into
developfrom
fix/10199-bench-server-fold-base-url
Jul 1, 2026
Merged

fix(benchmark): fold model base-URL env into bench-server runtime settings (#10199)#10676
lalalune merged 2 commits into
developfrom
fix/10199-bench-server-fold-base-url

Conversation

@lalalune

@lalalune lalalune commented Jul 1, 2026

Copy link
Copy Markdown
Member

Relates to #10199 (agent-driven adapter matrix).

The gap

The benchmark server (packages/app-core/src/benchmark/server.ts) folds provider API keys into the runtime's character.settings.secrets so plugins that call runtime.getSetting() can read them — but it did not fold the base-URL overrides. core getSetting() never reads process.env (multi-tenant isolation), and plugin-openai's getBaseURL() resolves the endpoint via getSetting("OPENAI_BASE_URL")/getSetting("CEREBRAS_BASE_URL"). So a run pointed at an OpenAI-compatible endpoint (Cerebras/OpenRouter/…) folded the key but not the URL.

Fix + verification (VERIFIED)

Fold OPENAI_BASE_URL/CEREBRAS_BASE_URL/OPENROUTER_BASE_URL/GROQ_BASE_URL alongside the keys (purely additive — only applies when set).

Verified live (gpt-oss-120b/Cerebras): probed the running agent after this change and runtime.getSetting("OPENAI_BASE_URL") now returns https://api.cerebras.ai/v1 (previously unset → getBaseURL() fell back to api.openai.com). So the setting the openai plugin needs is now present.

Scope note (honest)

This fix does what it claims — makes the base URL visible to getSetting(). In my live run the agent-driven benchmark still hit a 401 because the running plugin-openai (dist) did not consume the now-available setting for the actual model call (its getBaseURL() still resolved api.openai.com despite getSetting returning Cerebras). That is a separate plugin-openai issue (filed to #10199 for follow-up), not this server change. This server fold is a necessary prerequisite regardless. Cerebras /v1/models and /v1/chat/completions both return 200 with the key, so the endpoint/key are valid.

🤖 Generated with Claude Code

…tings (#10199)

The benchmark server folds provider API keys (OPENAI_API_KEY, CEREBRAS_API_KEY,
…) into the runtime settings map so plugins that call runtime.getSetting() can
authenticate — but it did NOT fold the base-URL overrides. core getSetting()
deliberately never reads process.env (multi-tenant isolation), and
plugin-openai's getBaseURL() resolves the endpoint via
getSetting("OPENAI_BASE_URL") / getSetting("CEREBRAS_BASE_URL"). So a run
pointed at an OpenAI-compatible endpoint (Cerebras/OpenRouter/…) via
OPENAI_BASE_URL folds the KEY but not the URL: the agent's conversational model
calls fall back to https://api.openai.com/v1, the provided key auth-fails, and
every agent-driven benchmark scores 0 on the 'agent reply' path (while
direct-call benchmarks, which read process.env, pass). Diagnosed live on
gpt-oss-120b/Cerebras: the failing turns hit the authFailed branch in
core/services/message.ts and emit the 'Eliza Cloud key isn't authorized' reply.

Fold OPENAI_BASE_URL / CEREBRAS_BASE_URL / OPENROUTER_BASE_URL / GROQ_BASE_URL
alongside the API keys. Purely additive — only takes effect when the var is set.

Verification note: end-to-end confirmation on the agent-driven benchmarks was
blocked locally because bench-server processes resisted termination and a
pre-fix server was reused across runs; the getSetting/base-URL gap is
unambiguous from the code paths above. Needs a clean-env/CI confirm on the
agent-driven adapter matrix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your trial has ended. Reactivate Greptile to resume code reviews.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f791f2cd-dc97-46fb-bbf3-8d18b3a1c0d0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/10199-bench-server-fold-base-url

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your trial has ended. Reactivate Greptile to resume code reviews.

@lalalune

lalalune commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

Clean verification pass completed on the PR branch after building the core/plugin lane.

Added evidence commit: b04ca46b53b (.github/issue-evidence/10199-bench-base-url-pr10676.md).

Validation run:

  • bun run build:core passed: 64 successful tasks.
  • bun test --coverage-reporter=lcov packages/app-core/src/benchmark/__tests__/cerebras-endpoint.test.ts passed: 11 tests, 34 assertions.
  • Live agent-driven context_bench smoke with Cerebras/gpt-oss-120b passed through the eliza TS bridge:
    • run group rg_20260701T062348Z_e920e81a
    • run id run_context_bench_20260701T062348Z_1_26d66083
    • status succeeded
    • score 1.0

Manual artifact review:

  • latest/context_bench__eliza.json reports provider/model cerebras / gpt-oss-120b, overall_accuracy=1.0, and token telemetry.
  • telemetry.jsonl contains a real RESPONSE_HANDLER trajectory; the model answered the generated password retrieval task correctly.
  • Server stdout shows Cerebras autowiring and registered @elizaos/plugin-openai handlers for RESPONSE_HANDLER and ACTION_PLANNER.
  • Server stderr does not contain the previous auth-fallback text and does not contain OpenAI plugin not available.

One setup note: an initial run before build:core was invalid because the disposable worktree had dependencies but no built @elizaos/plugin-openai dist; after the build, the same live smoke passed cleanly. This PR now has the clean confirmation requested in the original PR body.

@lalalune lalalune merged commit 10bb61b into develop Jul 1, 2026
11 of 57 checks passed
@lalalune lalalune deleted the fix/10199-bench-server-fold-base-url branch July 1, 2026 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant