chore: promote staging to staging-promote/66ccafb9-24321596712 (2026-04-13 03:02 UTC) by ironclaw-ci[bot] · Pull Request #2384 · nearai/ironclaw

ironclaw-ci · 2026-04-13T03:02:05Z

Auto-promotion from staging CI

Batch range: a53eac5c2dec6b6cd5c08189086093fde64aa9cb..4529f009d4e239e06590f083662ac8804bcfaf22
Promotion branch: staging-promote/4529f009-24323637408
Base: staging-promote/66ccafb9-24321596712
Triggered by: Staging CI batch at 2026-04-13 03:02 UTC

Commits in this batch (13):

a7401ec fix(gateway): scope chat approvals to the active thread (fix(gateway): scope chat approvals to the active thread #2267)
4032f6d Fix paired Telegram owner scope routine visibility (Fix paired Telegram owner scope routine visibility #2258)
7d66d83 fix(engine): always append ActionResult for every tool call (fix(engine): always append ActionResult for every tool call #2322)
764e586 feat(engine): LLM council via per-call model override in CodeAct (feat(engine): LLM council via per-call model override in CodeAct #2320)
70862ed feat(config): default CLI_MODE to TUI instead of REPL (feat(config): default CLI_MODE to TUI #2329)
207c4d4 ci: build docker image in release process (ci: build docker image in release process #2321)
cd9b60c fix: re-apply Telegram UTF-16 splitting and DB MIGRATION label (fix: re-apply Telegram UTF-16 splitting and DB MIGRATION label #2304)
88b87c0 feat: user-facing temperature setting (feat: user-facing temperature setting #2275)
fdb0a13 chore: sync staging and main (chore: sync staging and main #2337)
3cb77fe fix: resolve cargo-deny failures (wildcard deps + rand advisory) (fix: resolve cargo-deny failures (wildcard deps + rand advisory) #2370)
ed2d6dc fix web chat refresh active thread ([codex] Fix web chat refresh active thread #2330)
66ccafb chore(engine): update monty to v0.0.11 (chore(engine): update monty to v0.0.11 #2364)
4529f00 fix(engine): track consecutive action errors in orchestrator Tier 0 path (Orchestrator: add action error counting to Python execution loop #2325) (fix(engine): track consecutive action errors in orchestrator (#2325) #2340)

Current commits in this promotion (1)

Current base: staging-promote/66ccafb9-24321596712
Current head: staging-promote/4529f009-24323637408
Current range: origin/staging-promote/66ccafb9-24321596712..origin/staging-promote/4529f009-24323637408

4529f00 fix(engine): track consecutive action errors in orchestrator Tier 0 path (Orchestrator: add action error counting to Python execution loop #2325) (fix(engine): track consecutive action errors in orchestrator (#2325) #2340)

Auto-updated by staging promotion metadata workflow

Waiting for gates:

Tests: pending
E2E: pending
Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

…ath (#2325) (#2340) The Python orchestrator had no error counting for structured action calls (Tier 0), allowing threads to loop indefinitely on failing tool calls and complete "successfully" even when every tool call failed. This adds a consecutive_action_errors counter that increments when all actions in a batch fail, resets when any succeeds, injects a nudge at the threshold, and transitions to failed at threshold + 2. Also prefixes error outputs with [ACTION FAILED] for visibility and persists the counter in checkpoints. Closes #2325 Related: #2279, #2240 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-13T03:04:54Z

Code review

Found 12 issues across security, architecture, bugs, and performance:

CRITICAL (1)

[CRITICAL:88] Race condition: consecutive_action_errors counter not atomic — If orchestrator ever runs multiple concurrent Monty VMs with shared state (e.g., parallel sub-agents loading state from DB), the consecutive_action_errors counter will have lost-update bugs. Checkpoint persistence makes inconsistencies durable. Current single-VM-per-thread architecture is safe, but this becomes a data corruption hazard if concurrency is added.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/orchestrator/default.py#L473-L474

HIGH (3)

[HIGH:92] String concatenation for error messages instead of templates — Error messages at lines 874 and 880-885 use inline string concatenation instead of template files. CLAUDE.md requires multi-line prompt strings to live in crates/ironclaw_engine/prompts/*.md and be loaded via include_str!(). This violates the stated architecture and makes localization/A/B testing difficult.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/orchestrator/default.py#L874-L885

[HIGH:88] Magic number fragmentation: max_consecutive_errors + 2 threshold lacks documentation — Failure at >= max_consecutive_errors + 2 vs nudge at >= max_consecutive_errors introduces unexplained logic. The + 2 offset is unmotivated, unparameterized, and inconsistent with Tier 1 error handling. This diverges from design goal of "Consistency with Rust loop" (issue Orchestrator: add action error counting to Python execution loop #2325).

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/orchestrator/default.py#L869-L876

[HIGH:78] Missing timeout on Python VM execution — run_python_final() (refactored test helper) has memory limits (500_000 allocations) but no wall-clock timeout. If orchestrator Python code ever includes user-influenced loops, it can hang indefinitely. Current code is safe only because orchestrator has no dynamic loops.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/src/executor/orchestrator.rs#L2481-L2483

MEDIUM (7)

[MEDIUM:85] Unbounded string concatenation in hot path — Nudge message constructed via string concatenation at every failed batch crossing threshold. Repeated append_message() calls accumulate context weight unnecessarily.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/orchestrator/default.py#L880-L885

[MEDIUM:78] Test extraction refactoring introduces untyped Monty bridge — run_python_final() loses type safety by returning MontyObject and forcing callers to pattern-match with panics. Better approach: generic eval_python_value<T: From<MontyObject>> with a proper conversion trait.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/src/executor/orchestrator.rs#L2475-L2516

[MEDIUM:75] Test refactoring introduces code duplication — eval_python_bool() and eval_python_int() both duplicate the helpers extraction logic (searching for \ndef run_loop(). Should extract extract_helpers() function.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/src/executor/orchestrator.rs#L2518-L2525

[MEDIUM:72] DRY violation: checkpoint persistence repeated 7 times — __save_checkpoint__() dict is copy-pasted at lines 615, 635, 671, 769, 790, 807, 890. Future counter additions require hunting down all sites. Should extract _save_action_error_checkpoint() helper.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/orchestrator/default.py#L612-L617

[MEDIUM:72] Repeated JSON parsing in checkpoint serialization — monty_to_json() called twice per checkpoint; then .as_u64() chained 4 times. Low practical impact (checkpoints are infrequent), but indicates opportunity for optimization.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/src/executor/orchestrator.rs#L1741-L1744

[MEDIUM:70] Batch error/success counting assumes parallel results alignment — No assertion verifies len(results) == len(executable_calls). If Rust batch handler ever reorders results (e.g., by completion time), errors will pair with wrong calls silently.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/orchestrator/default.py#L734-L757

[MEDIUM:68] Partial batch success logic lacks explicit documentation — Reset on ANY success, increment only on ALL-fail design is sound but not explained. Code comment should justify why.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/orchestrator/default.py#L864-L867

LOW (1)

[LOW:52] Docstring synchronization debt — Old docstring at line 2470 describes bool-only behavior; new one describes generic behavior. Orphaned first line should be removed.

https://github.com/anthropics/ironclaw/blob/3f727c41afda86b7a1c74f354cefb27d9d192dce/crates/ironclaw_engine/src/executor/orchestrator.rs#L2470-L2474

Recommendations (by priority)

IMMEDIATE (production safety):

Add timeout wrapper to run_python_final() to prevent infinite-loop hangs
Add assertion for len(results) == len(executable_calls) to catch result alignment bugs

BEFORE MERGE (architectural consistency):

Move error messages to crates/ironclaw_engine/prompts/error_messages.md
Document or parameterize the + 2 threshold offset
Extract _save_action_error_checkpoint() helper to reduce DRY violations

NICE-TO-HAVE (cleanup):

Fix docstring at line 2470
Extract extract_helpers() to reduce test code duplication
Add generic eval_python_value<T> test helper for type safety

Overall: Feature is sound and tests are comprehensive. These are maintainability and clarity improvements, not correctness blockers. Most LOW/MEDIUM issues are maintenance debt rather than production bugs.

ironclaw-ci bot added the staging-promotion label Apr 13, 2026

github-actions bot added size: L 200-499 changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: promote staging to staging-promote/66ccafb9-24321596712 (2026-04-13 03:02 UTC)#2384

chore: promote staging to staging-promote/66ccafb9-24321596712 (2026-04-13 03:02 UTC)#2384
ironclaw-ci[bot] wants to merge 1 commit intostaging-promote/66ccafb9-24321596712from
staging-promote/4529f009-24323637408

ironclaw-ci bot commented Apr 13, 2026 •

edited by github-actions bot

Loading

Uh oh!

claude bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ironclaw-ci bot commented Apr 13, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto-promotion from staging CI

Commits in this batch (13):

Current commits in this promotion (1)

Uh oh!

claude bot commented Apr 13, 2026

Code review

CRITICAL (1)

HIGH (3)

MEDIUM (7)

LOW (1)

Recommendations (by priority)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ironclaw-ci bot commented Apr 13, 2026 •

edited by github-actions bot

Loading