Skip to content

Pin runtime selection per task stage#126

Merged
lee-to merged 2 commits into
mainfrom
fix/stage-runtime-selection-pin
May 14, 2026
Merged

Pin runtime selection per task stage#126
lee-to merged 2 commits into
mainfrom
fix/stage-runtime-selection-pin

Conversation

@lee-to

@lee-to lee-to commented May 13, 2026

Copy link
Copy Markdown
Owner

Summary

  • persist a stage-scoped active runtime selection on tasks so retries after crashes keep using the runtime that started the current status
  • clear the pinned runtime when the task advances to a new stage or when a human transition changes status, while preserving it for retry_from_blocked
  • add schema migration, data helpers, and regression coverage for runtime pin reuse and malformed payload handling

Validation

  • npm run ai:validate

@lee-to lee-to requested a review from ichinya May 13, 2026 17:56
@lee-to

lee-to commented May 13, 2026

Copy link
Copy Markdown
Owner Author

Code review

This PR persists a stage-scoped runtime selection on tasks and reuses it for same-status retries, with an additive SQLite migration and focused regression coverage. The implementation is coherent, but it changes the agent coordinator/runtime execution hot path and is currently always on, which violates the required risk-gating policy for non-trivial behavioral changes.

Must fix

  1. Gate runtime pinning behind an off-by-default feature flag — packages/agent/src/subagentQuery.ts:547 and packages/agent/src/subagentQuery.ts:592 — the new pin lookup and save path now runs for every subagent execution. The PR-review policy requires changes to coordinator/runtime hot paths to be guarded by an environment-driven flag that defaults to false. Add a central boolean env flag such as AIF_STAGE_RUNTIME_PIN_ENABLED in packages/shared/src/env.ts, document it in .env.example and docs/configuration.md, and make the off path preserve existing behavior by skipping both getTaskActiveRuntimeSelection and saveTaskActiveRuntimeSelection. The coordinator/API clear calls can remain harmless, but the new runtime-selection behavior should not be reachable while the flag is false.

Should fix

  1. Sync architecture docs for the new task persistence fields — docs/architecture.md:303 — the architecture doc lists the persisted tasks runtime fields but this PR adds active_runtime_status and active_runtime_selection_json without updating the documented data model. Add these fields to the task table description and briefly state that they are stage-scoped retry pins, distinct from session_id and runtime_limit_snapshot_json.

Nits

  • None.

Context gates

  • Architecture: pass — DB access from agent and api still goes through @aif/data, matching the documented module boundary.
  • Rules: error — risky hot-path behavior is not feature-flagged off by default.
  • Roadmap: warn — no roadmap linkage found, but this is a narrow reliability fix rather than a roadmap-sized feature.
  • CHECKLIST compliance (touched packages: agent, api, data, shared): pass with one caveat — package checklist items are mostly covered by repository boundaries, migration, and tests. I did not see evidence that npm run ai:validate completed in CI yet.
  • Docs sync: warn — schema/data-model docs need the update noted above.
  • PR size (455 changed lines / 11 files / 1 concern): warn — over the soft line threshold, but the change is cohesive and test-heavy, so I do not recommend splitting it.
  • Risk gating (feature flag): error — add AIF_STAGE_RUNTIME_PIN_ENABLED=false and keep existing runtime resolution as the default path.

Verdict: REQUEST_CHANGES until the off-by-default feature flag is added. Current checks show Build, ESLint, MCP Unit, npm audit, mcpunit, and ai-review passing, with Tests still in progress at review time.

@ichinya

ichinya commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Code review

PR #126 adds a task-level, stage-scoped runtime-selection pin so same-status retries keep the runtime/profile/model/options that started the stage. Current head: 315eb70. The schema/data-layer pieces are mostly coherent, and CI now reports success, but this is still not safe to merge because the new coordinator/runtime hot-path behavior remains always on.

Response pass

  • Addressed: CI is no longer pending; Build, Lint, Tests, MCP Unit, Security Audit, and AI Review are completed successfully for the current head.
  • Still unresolved: the previous blocker remains. Runtime pin lookup/save still executes unconditionally in packages/agent/src/subagentQuery.ts; I do not see an off-by-default env flag in packages/shared/src/env.ts, .env.example, docs/configuration.md, or turbo.json.
  • New blocker found: none.

What looks good

  • The DB migration is additive: active_runtime_status and active_runtime_selection_json are nullable task columns, so existing rows can migrate without destructive changes.
  • The data-layer helper validates malformed persisted selection JSON and ignores bad/mismatched payloads rather than crashing the agent path.
  • The API/coordinator clear points mostly match the intended lifecycle: clear on normal human transitions and successful stage advancement, preserve across retry_from_blocked.
  • toTaskResponse() intentionally strips the new internal persistence fields from the public task response, avoiding accidental API exposure.
  • Regression coverage exists for persistence/clear behavior, malformed payload handling, and same-status pinned runtime reuse.

Must fix

  1. [P1] Runtime pinning is still always-on — packages/agent/src/subagentQuery.ts, packages/shared/src/env.ts, .env.example, docs/configuration.md, turbo.json

    The PR changes the runtime-resolution hot path by reading getTaskActiveRuntimeSelection() and writing saveTaskActiveRuntimeSelection() for every task-stage subagent execution. That means the new behavior is active immediately after deploy for all projects and all task stages.

    Broken scenario: an operator deploys this and a task enters implementing/review; the runtime/profile/model/options become pinned without any rollout switch. If this introduces a production issue, there is no global kill switch to return to the previous resolution behavior. Per-task status transitions eventually clear the pin, but they do not provide a safe deploy-time rollback.

    Why it matters: this sits in the coordinator/runtime execution path. It can override later runtime-profile changes during same-status retries, and it changes recovery behavior for blocked/stale tasks. That is useful, but it needs explicit rollout control.

    Expected fix:

    • Add a central boolean flag, for example AIF_STAGE_RUNTIME_PIN_ENABLED=false by default, or use an AGENT_* name if you want Turbo passthrough to be covered by the existing wildcard.
    • Parse it in packages/shared/src/env.ts.
    • Document it in .env.example and docs/configuration.md.
    • If the flag uses an AIF_* name, add it to turbo.json globalPassThroughEnv.
    • In resolveExecutionContext(), when the flag is false, preserve old behavior exactly: do not call getTaskActiveRuntimeSelection() and do not call saveTaskActiveRuntimeSelection().
    • The coordinator/API clear calls may remain harmless, but no runtime-selection reuse/persistence should be reachable while disabled.
    • Add regression tests for disabled, enabled, and invalid/missing flag values. Disabled-path test should assert that effective profile resolution is used and both active-runtime-selection helpers are not called.

Must fix / merge-order requirement

None.

Should fix

  1. [P2] Document the new task persistence fields — docs/architecture.md / data model docs

    The PR adds tasks.active_runtime_status and tasks.active_runtime_selection_json, but the architecture/data-model docs still describe the persisted task runtime fields without this new stage-scoped retry pin. Add a short note that these fields are internal agent retry state, distinct from session_id and runtime_limit_snapshot_json, and are cleared on stage/human transitions except retry_from_blocked.

  2. [P2] Add explicit lifecycle tests for clearing behavior

    Current tests cover persistence and reuse, but I do not see focused coverage for the API/coordinator clearing contract. Add tests for:

    • human transition clears the pin;
    • retry_from_blocked preserves it;
    • successful stage advancement clears it;
    • same-status retry keeps it.

Nits

None.

Context gates

  • Architecture: pass — agent and api continue to go through @aif/data; no direct DB access was introduced.
  • Package CHECKLIST: warn — package boundaries look correct and CI passed, but the feature-flag/risk-gating gap is still unresolved for a hot-path agent change.
  • Docs / skill contract sync: warn — configuration docs and architecture/data-model docs are missing the new rollout flag and task persistence fields.
  • API/web/client sync: pass — no public API response field was added; toTaskResponse() omits the new internal DB fields.
  • Security surfaces: pass — no new access-control or external input surface identified; persisted selection omits raw API key values and rehydrates via env var.
  • Feature flag / risk gating: error — new runtime-selection behavior is always enabled.
  • Env / Turbo / process plumbing: error — no env flag exists; if an AIF_* flag is added it must be passed through Turbo.
  • Managed assets / ownership: n/a — no managed asset install/update/remove path touched.
  • Migration / backward compatibility: pass — migration is additive and nullable; malformed stored payloads are tolerated.
  • Dependency / lockfile / production install: n/a — no dependency or lockfile changes.
  • PR size / decomposition: warn — 455 changed lines / 11 files, but it is one cohesive concern and does not need splitting.
  • Stack / merge order / base freshness: pass — branch is one commit ahead of main and not behind; no stacked dependency found.
  • CI / audit: pass — current head has successful Build, Lint, Tests, MCP Unit, Security Audit, and AI Review.

Verification

  • Checked: PR metadata, current head SHA, diff, prior review comment, changed files, package CHECKLIST files, env/config docs, Turbo env passthrough, migration/data helpers, API/coordinator clear points, and CI workflow run status. Review rubric source:
  • Ran locally: not run; I did not have a local checkout in this review context.
  • CI: success for Build, Lint, Tests, MCP Unit, Security Audit, and AI Review on 315eb70.
  • Not verified: runtime behavior under an actual crashed agent process; only code/test/CI inspection was performed.

Verdict

Verdict: REQUEST_CHANGES until runtime pinning is behind an off-by-default env flag with docs, Turbo/env plumbing, and disabled/enabled regression coverage.

@ichinya ichinya left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: REQUEST_CHANGES until runtime pinning is behind an off-by-default env flag with docs, Turbo/env plumbing, and disabled/enabled regression coverage.

@lee-to

lee-to commented May 14, 2026

Copy link
Copy Markdown
Owner Author

Addressed the review feedback in 49037f1.

Changes made:

  • Added the off-by-default AIF_STAGE_RUNTIME_PIN_ENABLED=false feature flag.
  • Gated stage runtime pin lookup and persistence behind that flag, preserving the previous runtime resolution path while disabled.
  • Added Turbo env passthrough, .env.example, and configuration/architecture docs updates.
  • Added regression coverage for disabled/enabled behavior, invalid flag values, human transition clearing, retry_from_blocked preservation, and successful stage clearing.

Validation:

  • npm run ai:validate passed end-to-end with exit code 0.

@ichinya

ichinya commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Code review

Follow-up review for PR #126 at current head 49037f1. The prior blocker was about always-on runtime pinning in the agent/runtime hot path. The new commit adds the off-by-default flag, gates lookup/persistence behind it, updates Turbo/env/docs plumbing, and adds focused tests. I do not see remaining code blockers.

Response pass

  • Addressed: AIF_STAGE_RUNTIME_PIN_ENABLED now exists in shared env parsing with default false, and invalid values are rejected.
  • Addressed: resolveExecutionContext() now skips both getTaskActiveRuntimeSelection() and saveTaskActiveRuntimeSelection() when the flag is disabled.
  • Addressed: .env.example, docs/configuration.md, docs/architecture.md, and turbo.json were updated.
  • Addressed: regression coverage now includes disabled/enabled runtime pin behavior, invalid flag value, human transition clearing, retry_from_blocked preservation, and successful stage clearing.
  • Still unresolved: none.
  • New blocker found: none.

What looks good

  • The off path preserves the previous behavior: when AIF_STAGE_RUNTIME_PIN_ENABLED=false, runtime resolution still goes through resolveEffectiveRuntimeProfile() and the pin helpers are not called.
  • The opt-in path is narrow: it reuses a pin only for the same task status and same profile mode, then falls back to normal resolution otherwise.
  • Turbo passthrough is covered both by config and by a regression test that checks documented AIF_*_ENABLED flags.
  • Architecture docs now explicitly describe active_runtime_status / active_runtime_selection_json as internal stage-scoped retry pins, distinct from session_id and runtime-limit snapshots.
  • Lifecycle coverage is materially better: API tests cover human transition clear and retry_from_blocked preservation; coordinator tests cover successful stage clearing.

Must fix

I do not have meaningful code blockers anymore.

Must fix / merge-order requirement

None.

Should fix

None.

Nits

None.

Context gates

  • Architecture: pass — agent and api still use @aif/data; no direct DB boundary violation found.
  • Package CHECKLIST: pass — touched packages have boundary/tests/docs concerns covered.
  • Docs / skill contract sync: pass — .env.example, configuration docs, and architecture data-model docs are now in sync.
  • API/web/client sync: pass — no public task API contract was expanded; internal runtime pin fields are still stripped from task responses.
  • Security surfaces: pass — no new external access-control surface; persisted runtime selection stores env-var names, not raw secret values.
  • Feature flag / risk gating: pass — risky runtime pin behavior is off by default and opt-in.
  • Env / Turbo / process plumbing: pass — shared env parsing and Turbo passthrough include the new flag.
  • Managed assets / ownership: n/a — no managed asset install/update/remove path touched.
  • Migration / backward compatibility: pass — migration remains additive/nullable; malformed pin payloads are ignored.
  • Dependency / lockfile / production install: n/a — no dependency or lockfile changes.
  • PR size / decomposition: warn — 543 additions / 21 files is above a soft review threshold, but the changes are cohesive and mostly tests/docs/plumbing around one feature.
  • Stack / merge order / base freshness: pass — branch is 2 commits ahead of main, 0 behind, and mergeable.
  • CI / audit: warn — visible current-head Actions returned Build, Lint, Tests, MCP Unit, and Security Audit as successful; I did not see a separate AI Review workflow run for 49037f1 in the fetched workflow list, so merge should still follow branch protection.

Verification

  • Checked: PR metadata/head, prior review comments, diff against the previous reviewed head, full PR patch, env parsing, Turbo passthrough, docs updates, API/coordinator lifecycle tests, current workflow runs, and base freshness.
  • Ran locally: not run; review was source/CI inspection only.
  • CI: Build, Lint, Tests, MCP Unit, and Security Audit are green for 49037f1; author also reports npm run ai:validate passed.
  • Not verified: real crashed-agent recovery behavior in a live runtime process.
  • Review rubric source:

Verdict

Verdict: APPROVE / no meaningful blockers.

@lee-to lee-to merged commit 5610cb6 into main May 14, 2026
6 checks passed
@ichinya ichinya deleted the fix/stage-runtime-selection-pin branch May 14, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants