fix(codex): respect non-Claude model selection and OAuth, fix demo summary#2086
Conversation
Reviewer's GuideCodex adapter and spawner now correctly handle Claude tier model names for non-Claude runs, respect ChatGPT OAuth-based Codex auth, and the demo summary is hardened against the real /status tasks payload shape, with tests added around these behaviors. Sequence diagram for non-Claude model coercion and Codex OAuth-aware spawnsequenceDiagram
participant SpawnerCore as SpawnerCore
participant BanditRouter as BanditRouter
participant CodexAdapter as CodexAdapter
participant Env as Env
participant CodexCLI as CodexCLI
SpawnerCore->>BanditRouter: router_applicable(adapter_name)
BanditRouter-->>SpawnerCore: is_claude_compatible
SpawnerCore->>SpawnerCore: _coerce_model_for_non_claude_adapter(model_config, adapter_name, adapter_default_model)
SpawnerCore->>CodexAdapter: spawn(model_config)
CodexAdapter->>Env: _has_codex_auth()
Env-->>CodexAdapter: OPENAI_API_KEY or ~/.codex/auth.json
CodexAdapter->>CodexAdapter: _codex_model(model_config.model)
CodexAdapter->>CodexCLI: codex exec -m model
CodexCLI-->>CodexAdapter: session result
CodexAdapter-->>SpawnerCore: spawned session info
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Sonar insights (advisory, no merge-block)Snapshot of
Run This comment is a soft signal. The Sonar scan runs on push to |
📝 WalkthroughWalkthroughThe PR updates Codex model selection to detect OAuth auth, remap Claude-tier model names for non-Claude adapters, and coerce spawned models to adapter defaults when needed. It also makes demo summary parsing accept multiple ChangesCodex model selection
Demo summary parsing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…mmary (#2075) Three defects reported in #2075 made the Codex adapter unusable with a ChatGPT OAuth login. 1. Model routing handed Claude tier names to Codex. The batch/heuristic selector emits opus/sonnet/haiku with no adapter awareness, so a high-stakes role (manager/architect/security) produced `codex exec -m opus`, which Codex rejects. The spawner now substitutes the adapter's default model for an unpinned Claude tier name when the run-level adapter is non-Claude, so the model recorded for the run matches what actually runs. The Claude path is unchanged. CodexAdapter also maps any residual tier name to its default as a last-resort net. 2. Spurious OPENAI_API_KEY warning. The adapter warned on every spawn when the env var was absent, even with a valid ChatGPT OAuth session. It now detects ~/.codex/auth.json (written by `codex login`) and only warns when neither an API key nor an OAuth session is present. 3. `bernstein demo --real` crash. _print_demo_summary read the /status `tasks` field as a list, but the endpoint returns {"count", "items"}. Iterating the dict yielded its string keys and raised AttributeError on `.get`. It now unwraps the items list and keeps only dict rows. Fixes #2075
Review-bot acknowledgement summary
All must-address findings are resolved or acknowledged. |
|
bernstein doctor observe for PR #2086 ( sonar -- OK (project bernstein)
code-scanning -- FAIL (39 open alert(s))
Skipped backends (credentials not configured)
See docs/observability/unified-doctor.md for backend setup notes. |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The Claude tier model set is now duplicated in both
bernstein.adapters.codexandspawner_warm_pool; consider centralizing_CLAUDE_TIER_MODELSto avoid divergence if tier names change. - The Codex default model string
"gpt-5.4"is hardcoded in the adapter and tests; it may be safer to expose this as a single configurable constant or setting so changes to the default don’t require code edits in multiple places.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Claude tier model set is now duplicated in both `bernstein.adapters.codex` and `spawner_warm_pool`; consider centralizing `_CLAUDE_TIER_MODELS` to avoid divergence if tier names change.
- The Codex default model string `"gpt-5.4"` is hardcoded in the adapter and tests; it may be safer to expose this as a single configurable constant or setting so changes to the default don’t require code edits in multiple places.
## Individual Comments
### Comment 1
<location path="src/bernstein/core/agents/spawner_warm_pool.py" line_range="93-97" />
<code_context>
+ return model_config
+ if not adapter_default_model:
+ return model_config
+ return ModelConfig(
+ model=adapter_default_model,
+ effort=model_config.effort,
+ max_tokens=model_config.max_tokens,
+ is_batch=model_config.is_batch,
+ )
+
</code_context>
<issue_to_address>
**suggestion:** Consider constructing the coerced ModelConfig in a way that preserves future fields automatically.
Reconstructing `ModelConfig` with a hardcoded subset of fields will drop any new or optional attributes added later. If `ModelConfig` is a dataclass, consider using `dataclasses.replace(model_config, model=adapter_default_model)` (or an equivalent pattern) so only `model` changes and other fields are preserved automatically.
Suggested implementation:
```python
if not adapter_default_model:
return model_config
# Preserve all existing and future fields on ModelConfig, only overriding `model`
return dataclasses.replace(model_config, model=adapter_default_model)
```
To support `dataclasses.replace`, ensure this file imports `dataclasses` (or `replace` directly) near the top, for example:
- `import dataclasses`
or
- `from dataclasses import replace` and then use `replace(model_config, model=adapter_default_model)` instead of `dataclasses.replace(...)`.
If `ModelConfig` is not a dataclass but a Pydantic model or similar, replace the final line with an appropriate copy/update method, e.g.:
- `return model_config.model_copy(update={"model": adapter_default_model})`
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| return ModelConfig( | ||
| model=adapter_default_model, | ||
| effort=model_config.effort, | ||
| max_tokens=model_config.max_tokens, | ||
| is_batch=model_config.is_batch, |
There was a problem hiding this comment.
suggestion: Consider constructing the coerced ModelConfig in a way that preserves future fields automatically.
Reconstructing ModelConfig with a hardcoded subset of fields will drop any new or optional attributes added later. If ModelConfig is a dataclass, consider using dataclasses.replace(model_config, model=adapter_default_model) (or an equivalent pattern) so only model changes and other fields are preserved automatically.
Suggested implementation:
if not adapter_default_model:
return model_config
# Preserve all existing and future fields on ModelConfig, only overriding `model`
return dataclasses.replace(model_config, model=adapter_default_model)To support dataclasses.replace, ensure this file imports dataclasses (or replace directly) near the top, for example:
import dataclasses
orfrom dataclasses import replaceand then usereplace(model_config, model=adapter_default_model)instead ofdataclasses.replace(...).
If ModelConfig is not a dataclass but a Pydantic model or similar, replace the final line with an appropriate copy/update method, e.g.:
return model_config.model_copy(update={"model": adapter_default_model})
b507986 to
bed1d64
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/bernstein/cli/run_confirm.py`:
- Around line 459-464: The cost extraction in run_confirm.py’s payload handling
only reads the top-level total_cost_usd alias, so update the summary-building
logic to also fall back to summary.cost_usd and costs.spent_usd when computing
total_cost. Keep the task parsing flow intact, but adjust the total_cost
assignment path in the same block so the demo summary reflects the actual
/status spend even when the alias is missing.
In `@src/bernstein/core/agents/spawner_core.py`:
- Around line 1807-1812: The fallback in spawner_core.py only runs when
provider_name is None, so non-Claude providers like codex keep the Claude model
selection recorded in session.model_config and the initial trace. Update the
guard around _coerce_model_for_non_claude_adapter in the agent spawner flow so
it also applies when _resolve_routing() selects a non-Claude provider_name,
ensuring the recorded model matches the provider’s actual default before tracing
or persisting selection.
- Around line 1810-1811: The caching path is missing the adapter fallback model
because CachingAdapter does not expose default_model, so spawner_core’s
getattr(self._adapter, "default_model", None) returns None. Update
CachingAdapter in caching_adapter.py to proxy default_model from self._inner (or
otherwise expose it), so _coerce_model_for_non_claude_adapter can read the real
adapter’s fallback model when caching is enabled.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 663ec9a0-8eb3-4860-9998-0ec05f8d994b
📒 Files selected for processing (7)
src/bernstein/adapters/codex.pysrc/bernstein/cli/run_confirm.pysrc/bernstein/core/agents/spawner_core.pysrc/bernstein/core/agents/spawner_warm_pool.pytests/unit/test_adapter_codex.pytests/unit/test_cli_demo.pytests/unit/test_model_coercion.py
| raw_tasks = payload.get("tasks", []) | ||
| if isinstance(raw_tasks, dict): | ||
| raw_tasks = raw_tasks.get("items", []) | ||
| if isinstance(raw_tasks, list): | ||
| tasks_data = [t for t in raw_tasks if isinstance(t, dict)] | ||
| total_cost = float(payload.get("total_cost_usd", 0.0) or 0.0) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Look for where total_cost_usd is set on the /status payload and how status.py derives spend.
rg -nP -C3 '\btotal_cost_usd\b' src/bernstein/core/routes/
rg -nP -C3 'cost_usd|spent_usd' src/bernstein/core/routes/status_dashboard.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 15089
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '820,860p' src/bernstein/core/routes/status_dashboard.py
printf '\n---\n'
sed -n '1,260p' src/bernstein/cli/status.py
printf '\n---\n'
sed -n '440,490p' src/bernstein/cli/run_confirm.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 12828
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '820,860p' src/bernstein/core/routes/status_dashboard.py
printf '\n---\n'
sed -n '1,260p' src/bernstein/cli/status.py
printf '\n---\n'
sed -n '440,490p' src/bernstein/cli/run_confirm.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 12828
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '820,860p' src/bernstein/core/routes/status_dashboard.py
printf '\n---\n'
sed -n '1,260p' src/bernstein/cli/status.py
printf '\n---\n'
sed -n '440,490p' src/bernstein/cli/run_confirm.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 12828
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "def _extract_spent_cost|_extract_spent_cost\(" src/bernstein/cli/status.py src/bernstein/cli/run_confirm.py
printf '\n---\n'
sed -n '260,360p' src/bernstein/cli/status.py
printf '\n---\n'
sed -n '360,460p' src/bernstein/cli/status.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 8313
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "def _extract_spent_cost|_extract_spent_cost\(" src/bernstein/cli/status.py src/bernstein/cli/run_confirm.py
printf '\n---\n'
sed -n '260,360p' src/bernstein/cli/status.py
printf '\n---\n'
sed -n '360,460p' src/bernstein/cli/status.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 8313
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C4 'live_costs|spent_usd|total_cost_usd|summary"\] =|payload\["costs"\]' src/bernstein/core/routes/status_dashboard.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 3905
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C4 'live_costs|spent_usd|total_cost_usd|summary"\] =|payload\["costs"\]' src/bernstein/core/routes/status_dashboard.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 3905
Read the /status cost from summary.cost_usd or costs.spent_usd
src/bernstein/cli/run_confirm.py:459-464 only checks payload["total_cost_usd"], but /status exposes spend under summary.cost_usd and costs.spent_usd. That leaves the demo summary at $0.0000 when the top-level alias is absent.
Suggested fix
- total_cost = float(payload.get("total_cost_usd", 0.0) or 0.0)
+ total_cost = float(payload.get("total_cost_usd", 0.0) or 0.0)
+ if total_cost <= 0.0:
+ summary = payload.get("summary", {})
+ if isinstance(summary, dict):
+ total_cost = float(summary.get("cost_usd", 0.0) or 0.0)
+ if total_cost <= 0.0:
+ costs = payload.get("costs", {})
+ if isinstance(costs, dict):
+ total_cost = float(costs.get("spent_usd", 0.0) or 0.0)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raw_tasks = payload.get("tasks", []) | |
| if isinstance(raw_tasks, dict): | |
| raw_tasks = raw_tasks.get("items", []) | |
| if isinstance(raw_tasks, list): | |
| tasks_data = [t for t in raw_tasks if isinstance(t, dict)] | |
| total_cost = float(payload.get("total_cost_usd", 0.0) or 0.0) | |
| raw_tasks = payload.get("tasks", []) | |
| if isinstance(raw_tasks, dict): | |
| raw_tasks = raw_tasks.get("items", []) | |
| if isinstance(raw_tasks, list): | |
| tasks_data = [t for t in raw_tasks if isinstance(t, dict)] | |
| total_cost = float(payload.get("total_cost_usd", 0.0) or 0.0) | |
| if total_cost <= 0.0: | |
| summary = payload.get("summary", {}) | |
| if isinstance(summary, dict): | |
| total_cost = float(summary.get("cost_usd", 0.0) or 0.0) | |
| if total_cost <= 0.0: | |
| costs = payload.get("costs", {}) | |
| if isinstance(costs, dict): | |
| total_cost = float(costs.get("spent_usd", 0.0) or 0.0) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/bernstein/cli/run_confirm.py` around lines 459 - 464, The cost extraction
in run_confirm.py’s payload handling only reads the top-level total_cost_usd
alias, so update the summary-building logic to also fall back to
summary.cost_usd and costs.spent_usd when computing total_cost. Keep the task
parsing flow intact, but adjust the total_cost assignment path in the same block
so the demo summary reflects the actual /status spend even when the alias is
missing.
| adapter_name=self._adapter.name(), | ||
| adapter_default_model=getattr(self._adapter, "default_model", None), |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
ast-grep outline src/bernstein/adapters/caching_adapter.py
rg -nP 'default_model|__getattr__' src/bernstein/adapters/caching_adapter.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 438
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '== caching_adapter outline =='
ast-grep outline src/bernstein/adapters/caching_adapter.py
echo
echo '== caching_adapter relevant lines =='
sed -n '1,220p' src/bernstein/adapters/caching_adapter.py
echo
echo '== spawner_core around the referenced lines =='
sed -n '1790,1825p' src/bernstein/core/agents/spawner_core.py
echo
echo '== search for default_model handling around adapters =='
rg -n 'default_model|__getattr__|CachingAdapter' src/bernstein/adapters src/bernstein/core/agents/spawner_core.pyRepository: sipyourdrink-ltd/bernstein
Length of output: 11721
CachingAdapter needs to expose default_model. At src/bernstein/core/agents/spawner_core.py:1810-1811, getattr(self._adapter, "default_model", None) stays None when caching is enabled, because src/bernstein/adapters/caching_adapter.py only forwards name(), is_alive(), kill(), and detect_tier() and keeps the real adapter on self._inner. That leaves _coerce_model_for_non_claude_adapter without the adapter fallback model. Read it from self._inner or proxy default_model on CachingAdapter.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/bernstein/core/agents/spawner_core.py` around lines 1810 - 1811, The
caching path is missing the adapter fallback model because CachingAdapter does
not expose default_model, so spawner_core’s getattr(self._adapter,
"default_model", None) returns None. Update CachingAdapter in caching_adapter.py
to proxy default_model from self._inner (or otherwise expose it), so
_coerce_model_for_non_claude_adapter can read the real adapter’s fallback model
when caching is enabled.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/test_cli_demo.py`:
- Around line 53-57: The assertions in the demo CLI test are too broad and can
pass even when the “Bugs fixed” summary row is incorrect. Update the checks
around the rendered output in test_cli_demo to assert the full row pattern
directly from the relevant formatter output, using the existing buf.getvalue()
content and the summary row text produced by the CLI, so the test verifies the
exact “1 / 3” style row instead of matching isolated substrings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b3e4cf9d-3b67-4cb6-b208-768d41683887
📒 Files selected for processing (7)
src/bernstein/adapters/codex.pysrc/bernstein/cli/run_confirm.pysrc/bernstein/core/agents/spawner_core.pysrc/bernstein/core/agents/spawner_warm_pool.pytests/unit/test_adapter_codex.pytests/unit/test_cli_demo.pytests/unit/test_model_coercion.py
| out = buf.getvalue() | ||
| # 1 done out of 3 total, rendered without raising. | ||
| assert "1" in out | ||
| assert "/ 3" in out | ||
| assert "$0.5000" in out |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Strengthen summary assertions to avoid false positives.
Lines 55-56 can pass even if the “Bugs fixed” row is wrong. Assert the row pattern directly.
Suggested diff
+import re
@@
- assert "1" in out
- assert "/ 3" in out
+ assert re.search(r"Bugs fixed\s+1\s*/\s*3", out)
assert "$0.5000" in out📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| out = buf.getvalue() | |
| # 1 done out of 3 total, rendered without raising. | |
| assert "1" in out | |
| assert "/ 3" in out | |
| assert "$0.5000" in out | |
| import re | |
| ... | |
| out = buf.getvalue() | |
| # 1 done out of 3 total, rendered without raising. | |
| assert re.search(r"Bugs fixed\s+1\s*/\s*3", out) | |
| assert "$0.5000" in out |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/test_cli_demo.py` around lines 53 - 57, The assertions in the demo
CLI test are too broad and can pass even when the “Bugs fixed” summary row is
incorrect. Update the checks around the rendered output in test_cli_demo to
assert the full row pattern directly from the relevant formatter output, using
the existing buf.getvalue() content and the summary row text produced by the
CLI, so the test verifies the exact “1 / 3” style row instead of matching
isolated substrings.
Summary
Fixes the three defects in #2075 that made the Codex adapter unusable with a ChatGPT OAuth login.
1. Model routing handed Claude tier names to Codex
The batch/heuristic selector emits
opus/sonnet/haikuwith no adapter awareness, so a high-stakes role (manager/architect/security) producedcodex exec -m opus, which Codex rejects. The spawner now substitutes the adapter's default model for an unpinned Claude tier name when the run-level adapter is non-Claude, so the model recorded for the run matches what actually runs. The Claude path is byte-identical (gated on a non-Claude adapter).CodexAdapteralso maps any residual tier name to its default as a last-resort net.2. Spurious
OPENAI_API_KEYwarning despite OAuthThe adapter warned on every spawn when
OPENAI_API_KEYwas absent, even with a valid ChatGPT OAuth session. It now detects~/.codex/auth.json(written bycodex login) and only warns when neither an API key nor an OAuth session is present.3.
bernstein demo --realcrash_print_demo_summaryread the/statustasksfield as a list, but the endpoint returns{"count", "items"}. Iterating the dict yielded its string keys and raisedAttributeError: 'str' object has no attribute 'get'. It now unwraps the items list and keeps only dict rows.Tests
tests/unit/test_model_coercion.pyfor the adapter-aware model coercion (Claude path unchanged).tests/unit/test_adapter_codex.py: OAuth session suppresses the warning; a Claude tier name maps to the Codex default in argv.tests/unit/test_cli_demo.py: the demo summary handles the real/statusshape without crashing (reproduces the reported AttributeError before the fix).ruff checkandruff formatclean.Fixes #2075
Summary by Sourcery
Ensure Codex adapter works correctly with ChatGPT OAuth and non-Claude model selections, and prevent the demo status summary from crashing on the current API response shape.
Bug Fixes:
Enhancements:
Tests:
Summary by CodeRabbit
New Features
Bug Fixes
Testing