fix(benchmark): fold model base-URL env into bench-server runtime settings (#10199)#10676
Conversation
…tings (#10199) The benchmark server folds provider API keys (OPENAI_API_KEY, CEREBRAS_API_KEY, …) into the runtime settings map so plugins that call runtime.getSetting() can authenticate — but it did NOT fold the base-URL overrides. core getSetting() deliberately never reads process.env (multi-tenant isolation), and plugin-openai's getBaseURL() resolves the endpoint via getSetting("OPENAI_BASE_URL") / getSetting("CEREBRAS_BASE_URL"). So a run pointed at an OpenAI-compatible endpoint (Cerebras/OpenRouter/…) via OPENAI_BASE_URL folds the KEY but not the URL: the agent's conversational model calls fall back to https://api.openai.com/v1, the provided key auth-fails, and every agent-driven benchmark scores 0 on the 'agent reply' path (while direct-call benchmarks, which read process.env, pass). Diagnosed live on gpt-oss-120b/Cerebras: the failing turns hit the authFailed branch in core/services/message.ts and emit the 'Eliza Cloud key isn't authorized' reply. Fold OPENAI_BASE_URL / CEREBRAS_BASE_URL / OPENROUTER_BASE_URL / GROQ_BASE_URL alongside the API keys. Purely additive — only takes effect when the var is set. Verification note: end-to-end confirmation on the agent-driven benchmarks was blocked locally because bench-server processes resisted termination and a pre-fix server was reused across runs; the getSetting/base-URL gap is unambiguous from the code paths above. Needs a clean-env/CI confirm on the agent-driven adapter matrix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Your trial has ended. Reactivate Greptile to resume code reviews.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Your trial has ended. Reactivate Greptile to resume code reviews.
|
Clean verification pass completed on the PR branch after building the core/plugin lane. Added evidence commit: Validation run:
Manual artifact review:
One setup note: an initial run before |
Relates to #10199 (agent-driven adapter matrix).
The gap
The benchmark server (
packages/app-core/src/benchmark/server.ts) folds provider API keys into the runtime'scharacter.settings.secretsso plugins that callruntime.getSetting()can read them — but it did not fold the base-URL overrides. coregetSetting()never readsprocess.env(multi-tenant isolation), andplugin-openai'sgetBaseURL()resolves the endpoint viagetSetting("OPENAI_BASE_URL")/getSetting("CEREBRAS_BASE_URL"). So a run pointed at an OpenAI-compatible endpoint (Cerebras/OpenRouter/…) folded the key but not the URL.Fix + verification (VERIFIED)
Fold
OPENAI_BASE_URL/CEREBRAS_BASE_URL/OPENROUTER_BASE_URL/GROQ_BASE_URLalongside the keys (purely additive — only applies when set).Verified live (gpt-oss-120b/Cerebras): probed the running agent after this change and
runtime.getSetting("OPENAI_BASE_URL")now returnshttps://api.cerebras.ai/v1(previously unset →getBaseURL()fell back toapi.openai.com). So the setting the openai plugin needs is now present.Scope note (honest)
This fix does what it claims — makes the base URL visible to
getSetting(). In my live run the agent-driven benchmark still hit a401because the runningplugin-openai(dist) did not consume the now-available setting for the actual model call (itsgetBaseURL()still resolvedapi.openai.comdespitegetSettingreturning Cerebras). That is a separate plugin-openai issue (filed to #10199 for follow-up), not this server change. This server fold is a necessary prerequisite regardless. Cerebras/v1/modelsand/v1/chat/completionsboth return 200 with the key, so the endpoint/key are valid.🤖 Generated with Claude Code