Skip to content

Fix evolution limits were not enforced#392

Open
Juanpe Bolívar (arximboldi) wants to merge 6 commits into
developfrom
jp/fix-max-iterations
Open

Fix evolution limits were not enforced#392
Juanpe Bolívar (arximboldi) wants to merge 6 commits into
developfrom
jp/fix-max-iterations

Conversation

@arximboldi

Copy link
Copy Markdown
Contributor

Summary

I was running some experiments with the eval suite and at some point I noticed Nixmac had gotten stuck in a loop burning tokens like there's no tomorrow. That's because while run_evals.py tries to set the max iterations, this check had been removed in #325, which had landed develop unreviewed.

This reverts that commit, and also makes sure that the alternative suggested in #325 (max_token_budget) is enforced properly, which also wasn't.

I'd kindly ask coopmoney and Scott McMaster (@scottmcmaster) to review carefully in case I made some wrong assumptions here.

Test Plan

  • No test plan needed

Docs

  • Docs updated (companion PR in darkmatter/nixmac-web: #___)
  • No docs update needed

Adds the TokenBudget variant to EvolutionLimitKind (with matching
attempts_label/prompt/stop_summary arms and a small format_token_count
helper) so the next commit can wire up the loop guard.

Also consolidates the max_token_budget read: the evolve loop now
destructures it from EvolutionLimits::load alongside
max_build_attempts and (post-#325-revert) max_iterations, instead of
calling store::get_max_token_budget separately. To keep UI writes via
store::set_max_token_budget routed through the same source of truth,
the store getter/setter are made Slice-aware (mirroring the existing
get_max_iterations / get_max_build_attempts pattern), with a fallback
to the legacy tauri-plugin-store path when the Slice isn't registered.

No behavior change in the loop yet; enforcement lands in the next
commit.
Adds the missing comparison: after each API response, if total_tokens
has reached max_token_budget, ask the user whether to continue
(interactive) or stop (non-interactive). On continue, extend the
budget by the original amount, mirroring the BuildAttempts UX. On
stop, hand off to finish_after_limit_stop.

PR #325 removed max_iterations on the premise that max_token_budget
already enforced a session bound. It didn't — the value was loaded,
logged, and emitted to the UI progress bar, but never compared
against total_tokens to terminate the loop. This closes that gap.

Providers that don't return usage (Ollama, some CLI providers)
sidestep this guard entirely; the restored MaxIterations check
covers those.
…eached

Adds a new EvolutionState variant for runs that finish_after_limit_stop
terminates. Before this, hitting any safety guard
(NoProgress / MaxIterations / BuildAttempts / TokenBudget) ended the run
as Conversational or Generated depending on whether edits had been made,
making it impossible for downstream consumers (notably the eval harness)
to tell "the agent decided it was done" from "we cut it off".

finish_after_limit_stop now sets state to LimitReached unconditionally.
TypeScript binding regenerated via specta.

Eval harness scoring will need a companion update on the nixmac-web
side to grade LimitReached separately.
Two cosmetic gaps left behind by the structural LimitReached work:

- cli.rs printed "Evolution completed successfully" on any non-
  conversational success, including runs the loop cut off. Now
  matches against the state and prints a stopped-for-safety message
  in the LimitReached arm.

- use-evolve.ts toasted ✓ "Evolution complete" with the success
  variant for every non-error result. Now branches on the new
  state: a ⏸ "Evolution stopped (safety limit reached)" toast.info
  fires for LimitReached so the user can tell their run was cut
  off without reading the telemetry. The partial change map is
  still mirrored — limit-reached runs can contain useful edits the
  user may want to review or follow up on.

Adds a use-evolve.test.ts case for the new path.
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor
Messages
📖 No docs update needed — acknowledged.

📋 PR Overview

Lines changed 479 (+441 / -38)
Files 0 added, 22 modified, 0 deleted
Draft / WIP no
Has Test Plan no
No Test Plan Needed yes
New UI components no
New Storybook stories no
New Rust modules no
New TS source files no
New tests no
package.json touched no
Cargo.toml touched no
Infra / CI touched no

🔬 Coverage

Report Lines Statements Functions Branches
apps/native/coverage/coverage-summary.json 25.6% 25.8% 24.3% 18.4%

Generated by 🚫 dangerJS against 0e8b9fc

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

🎨 Storybook preview

Open Storybook preview

Updated for 0e8b9fc


🧭 Story changes

Compared to develop (snapshot diff at story level):

✏️ Changed stories (2)


❌ Failed snapshots (3)

These stories' HTML snapshots changed. Current renderings (run bun run test:update-snapshots and commit if intended):

Settings/AutoConfigField › Controls

Settings/AutoConfigField › Controls

Settings/AutoTuningSection › Evolution Settings

Settings/AutoTuningSection › Evolution Settings

Widget/Steps/SetupStep › Default Config Required

Widget/Steps/SetupStep › Default Config Required

@czxtm

Copy link
Copy Markdown
Member

/oc fix conflicts

# Conflicts:
#	apps/native/src-tauri/src/evolve/mod.rs
#	apps/native/src/components/widget/settings/auto-config-field.stories.tsx
#	apps/native/src/components/widget/settings/auto-tuning-section.stories.tsx
@darkmatter

darkmatter Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Conflicts fixed. develop merged into jp/fix-max-iterations.

Changes:

  • apps/native/src-tauri/src/evolve/mod.rs: Resolved state load to keep max_iterations & max_token_budget.
  • apps/native/src/components/widget/settings/auto-config-field.stories.tsx: Updated schema format. Added both fields.
  • apps/native/src/components/widget/settings/auto-tuning-section.stories.tsx: Added both fields to UI & default values.

Merge commit created. Ready.

New%20session%20-%202026-06-12T22%3A24%3A06.375Z
opencode session  |  github run

@czxtm cooper (czxtm) enabled auto-merge June 12, 2026 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants