Fix evolution limits were not enforced#392
Open
Juanpe Bolívar (arximboldi) wants to merge 6 commits into
Open
Conversation
Adds the TokenBudget variant to EvolutionLimitKind (with matching attempts_label/prompt/stop_summary arms and a small format_token_count helper) so the next commit can wire up the loop guard. Also consolidates the max_token_budget read: the evolve loop now destructures it from EvolutionLimits::load alongside max_build_attempts and (post-#325-revert) max_iterations, instead of calling store::get_max_token_budget separately. To keep UI writes via store::set_max_token_budget routed through the same source of truth, the store getter/setter are made Slice-aware (mirroring the existing get_max_iterations / get_max_build_attempts pattern), with a fallback to the legacy tauri-plugin-store path when the Slice isn't registered. No behavior change in the loop yet; enforcement lands in the next commit.
Adds the missing comparison: after each API response, if total_tokens has reached max_token_budget, ask the user whether to continue (interactive) or stop (non-interactive). On continue, extend the budget by the original amount, mirroring the BuildAttempts UX. On stop, hand off to finish_after_limit_stop. PR #325 removed max_iterations on the premise that max_token_budget already enforced a session bound. It didn't — the value was loaded, logged, and emitted to the UI progress bar, but never compared against total_tokens to terminate the loop. This closes that gap. Providers that don't return usage (Ollama, some CLI providers) sidestep this guard entirely; the restored MaxIterations check covers those.
…eached Adds a new EvolutionState variant for runs that finish_after_limit_stop terminates. Before this, hitting any safety guard (NoProgress / MaxIterations / BuildAttempts / TokenBudget) ended the run as Conversational or Generated depending on whether edits had been made, making it impossible for downstream consumers (notably the eval harness) to tell "the agent decided it was done" from "we cut it off". finish_after_limit_stop now sets state to LimitReached unconditionally. TypeScript binding regenerated via specta. Eval harness scoring will need a companion update on the nixmac-web side to grade LimitReached separately.
Two cosmetic gaps left behind by the structural LimitReached work: - cli.rs printed "Evolution completed successfully" on any non- conversational success, including runs the loop cut off. Now matches against the state and prints a stopped-for-safety message in the LimitReached arm. - use-evolve.ts toasted ✓ "Evolution complete" with the success variant for every non-error result. Now branches on the new state: a ⏸ "Evolution stopped (safety limit reached)" toast.info fires for LimitReached so the user can tell their run was cut off without reading the telemetry. The partial change map is still mirrored — limit-reached runs can contain useful edits the user may want to review or follow up on. Adds a use-evolve.test.ts case for the new path.
Contributor
📋 PR Overview
🔬 Coverage
|
Contributor
🎨 Storybook previewUpdated for 0e8b9fc 🧭 Story changesCompared to ✏️ Changed stories (2) ❌ Failed snapshots (3)These stories' HTML snapshots changed. Current renderings (run Settings/AutoConfigField › ControlsSettings/AutoTuningSection › Evolution SettingsWidget/Steps/SetupStep › Default Config Required |
b758a30 to
522b7f9
Compare
Member
|
/oc fix conflicts |
# Conflicts: # apps/native/src-tauri/src/evolve/mod.rs # apps/native/src/components/widget/settings/auto-config-field.stories.tsx # apps/native/src/components/widget/settings/auto-tuning-section.stories.tsx
Contributor
|
Conflicts fixed. Changes:
Merge commit created. Ready. |
cooper (czxtm)
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
I was running some experiments with the eval suite and at some point I noticed Nixmac had gotten stuck in a loop burning tokens like there's no tomorrow. That's because while
run_evals.pytries to set the max iterations, this check had been removed in #325, which had landeddevelopunreviewed.This reverts that commit, and also makes sure that the alternative suggested in #325 (
max_token_budget) is enforced properly, which also wasn't.I'd kindly ask coopmoney and Scott McMaster (@scottmcmaster) to review carefully in case I made some wrong assumptions here.
Test Plan
Docs