Status: In progress Owner: sjarmak Last updated: 2025-09-07
- Phase 0 M0.1–M0.5: Completed. EnvKind + gating, schema migration (agent_mode/toolbox_path), runtime_env + wiring, fake Amp CLI shim + test, CI workflow; Rust build fixed.
- Dev-only UI: environment switching handled elsewhere; removed gear/preferences. Agent Mode dropdown + Toolbox Path remain wired; session_create persists both.
- M1.1 Agent Mode end-to-end: Completed. Agent mode selection flows through UI → IPC → Rust orchestrator → child env.
- M1.2 Toolbox Resolver: Completed. Feature-flagged symlink fan-in with Windows copy fallback, security limits, comprehensive unit/integration tests.
- M1.3 Toolbox UI surfaces: Completed. File picker component, header display, full IPC + persistence, comprehensive component tests.
- Immediate next steps: M1.4 complete metrics export updates, then remaining Phase 1 tasks.
- Zero regression: production path must remain unchanged until explicitly enabled; new features are dev-only by default.
- Strong tests first: every new module lands with unit + integration coverage; E2E added for user-visible flows.
- Gating: Agent Mode is allowed in both Production and Local; Toolbox features remain flag-gated. Ensure parity and no regressions when flags are off.
- Observability: structured logs and metrics for env composition, flags, and selected modes.
- Execution protocol: the coding agent must change [ ] to [X] on completion of each task.
- Phase 0 — Foundations & Safety Nets
- M0.1 Feature flags & environment plumbing
- M0.2 Schema & migrations (additive, reversible)
- M0.3 Runtime spawn wrapper skeleton
- M0.4 Fake Amp CLI shim + test harness
- M0.5 CI hardening and coverage gates
- Phase 1 — Agent Modes + Toolbox Resolver
- M1.1 Agent Mode end-to-end (dev-only UI + wiring)
- M1.2 Toolbox Resolver (symlink fan-in) + env composition
- M1.3 Toolbox UI surfaces
- M1.4 Metrics + export updates
- M1.5 CI regression gates and E2E
- M1.6 Docs + rollback hooks
- M1.7 Large TUI Terminal View (pty integration + toggle)
Objective
- Introduce
EnvKindand feature flags; ensure prod coercion to defaults.
Tasks
- Define
EnvKind { Local, Prod, CI }in backend (Rust) and expose via IPC to desktop-ui. - Add runtime feature switches:
agent_modes,toolboxes(default off in Prod). - Desktop UI: derive
devModefromEnvKindand gate controls.
Acceptance criteria
- With
EnvKind=Prod, any selected mode is coerced todefault; toolbox ignored. - With
EnvKind=Local, selections flow through IPC (no actual spawn yet).
Tests
- Unit (TS): reducer/UI gating tests.
- Unit (Rust):
env_kindpropagation.
Objective
- Add additive, nullable columns to support agent mode and toolbox path.
Schema (SQLite)
-- up
ALTER TABLE runs ADD COLUMN agent_mode TEXT NULL;
ALTER TABLE runs ADD COLUMN toolbox_path TEXT NULL;
-- Optional future tables (scoped for Phase 2):
-- CREATE TABLE toolboxes (id INTEGER PRIMARY KEY, project_id INTEGER, name TEXT, path TEXT, created_at TEXT);
-- CREATE TABLE toolbox_sets (id INTEGER PRIMARY KEY, project_id INTEGER, name TEXT);
-- CREATE TABLE toolbox_set_items (set_id INTEGER, toolbox_id INTEGER, "order" INTEGER);
-- CREATE TABLE mcps (id INTEGER PRIMARY KEY, project_id INTEGER, name TEXT, endpoint TEXT, auth_ref TEXT, config_json TEXT);
-- down (rollback)
-- NOTE: SQLite lacks DROP COLUMN; provide logical rollback via table-rebuild script if needed.Backfill
- Script to set
agent_mode='default'where NULL; leavetoolbox_pathNULL.
Tasks
- Update Rust data models for
runs(agent_mode, toolbox_path). - Add migration runner step invoked before app start in dev/ci.
Acceptance criteria
- Legacy DB opens without errors; new columns readable/writable.
Tests
- Unit (Rust): loading DB pre-migration then migrating succeeds; rollback script verified on a copy.
Objective
- Implement typed env composition without enabling new UI yet.
Modules
runtime_env.rs- Input:
EnvKind, Variant { agentMode?: string, toolboxPath?: string } - Output:
EnvMap(AMP_TOOLBOX, PATH additions, AMP_EXPERIMENTAL_AGENT_MODE when Local)
- Input:
agent_mode.rs- Coercion: if
EnvKind != Local, returndefault.
- Coercion: if
Tasks
- Implement
compose_runtime_env(); log structured JSON for composed env. - Wire desktop-ui -> IPC -> Rust orchestrator (no UI controls yet).
Acceptance criteria
- Given inputs, function returns deterministic env map; prod coercion validated.
Tests
- Unit (Rust): table-driven tests for combinations of EnvKind and Variant.
Objective
- Provide a stable, non-interactive target for integration tests asserting argv/env wiring and preventing the real TUI from launching.
Why
- The real Amp CLI defaults to launching the interactive TUI (
src/main.ts→src/tui/app.ts), which is unsuitable for CI and deterministic tests. The shim echoes argv/env as JSON so we can verify agent mode flags, AMP_TOOLBOX, and PATH composition without rendering the TUI.
Files
tools/fake-amp-cli.mjs
Example implementation
#!/usr/bin/env node
import process from 'node:process';
const out = {
argv: process.argv.slice(2),
env: {
AMP_TOOLBOX: process.env.AMP_TOOLBOX || null,
AMP_EXPERIMENTAL_AGENT_MODE: process.env.AMP_EXPERIMENTAL_AGENT_MODE || null,
PATH: process.env.PATH || null,
}
};
console.log(JSON.stringify(out));Commands
pnpm exec node tools/fake-amp-cli.mjs --agent-mode geppetto:mainTasks
- Orchestrator accepts
AMP_CLI_BINoverride to use shim in tests. - Integration test spawns a Variant and asserts shim output.
Tests
- Vitest integration:
spawn_with_modes.int.test.mjsasserts argv/env.
Objective
- Ensure all builds and tests run in CI with coverage; publish shim logs on failure.
Tasks
- CI job runs:
pnpm build && pnpm typecheck && pnpm test. - Collect coverage:
pnpm test -- --coveragefor TS; Rust usescargo build. - Upload artifacts: logs/ directory if present.
Acceptance criteria
- CI green on main and PRs; coverage for new modules ≥ 90%.
- DB migration applied and reversible strategy documented.
-
runtime_envcomposes env with correct prod coercion. - Fake CLI harness validated locally and in CI.
Objective
- Let developers select an Agent Mode in UI (Local only) and pass to child process; record in metrics.
Tasks
- Extend Preset & Variant schemas to include
agentMode(persist to DB where applicable). - UI: Add dropdown; visible in both Prod and Local.
- Session creation persists
agent_modeandtoolbox_pathto chat_sessions. - DB: add
agent_modeandtoolbox_pathto chat_sessions (migration 003). - Orchestrator: set
AMP_EXPERIMENTAL_AGENT_MODE=<mode>when selected. - Metrics: include
agent_modein exports/results.
Acceptance criteria
- Both Prod and Local propagate selected agent mode to child/env.
- UI shows chip
Mode: <value>in header.
Tests
- Unit (TS): UI state round-trip (set/get Agent Mode).
- Integration (Rust): env composition matrix; command selection.
- E2E: WebView selects mode, run summary shows chip, metrics contain mode (deferred to end of Phase 1).
Objective
- Allow selecting a toolbox path; symlink fan-in into a runtime dir and expose via
AMP_TOOLBOXand PATH.
Tasks
- Implement
toolbox_resolver.rs:- Input:
Vec<PathBuf>; Output:{resolved_dir, bin_dir}. - Create
~/.amp-orchestra/runtime_toolboxes/<run>/<variant>/and symlink all roots (ordered; last-write wins). - Windows: if symlinks unavailable, copy files (warn).
- Security: canonicalize paths, forbid traversal, limit file count/size.
- Input:
- Orchestrator: set
AMP_TOOLBOXand prependbin_dirto PATH. - Cleanup policy: remove temp dir on completion unless
keep_artifacts.
Acceptance criteria
- Shim sees
AMP_TOOLBOX; a dummy tool within toolbox is executable in PATH. - Resolver deterministic and secure; handles duplicate names.
Tests
- Unit (Rust): resolver behavior, error handling, path validation.
- Integration: create temporary toolboxes; assert discovery/execution via shim.
- E2E: create simple bash tool, invoke through agent chat; confirm output.
Tasks
- Add file picker for Toolbox Path (dev + prod visible, but prod ignored by runtime until Phase 2 gating decision).
- Show selected toolbox on Variant card.
- Persist
toolbox_pathto DB and IPC.
Tests
- Unit (TS): state store save/restore.
- Component tests: UI interaction and data flow.
Tasks
- Add
agent_modeto exported metrics/results. - Add
toolbox_path,tools_available_count, optionaltools_used[]. - Update JSONL/CSV/HTML exporters to display per-mode comparisons.
Tests
- Unit/Typecheck pass with
agent_modeaddition. - Unit: exporter includes remaining fields and renders HTML sections.
Tasks
- Add Playwright or WebView-based E2E to CI (
tools/tests/*). - Add rule: touching spawn/env composition modules requires tests.
- Parity check: Prod vs Local basic flows (no coercion of agent mode).
Acceptance criteria
- E2E stable (headless), with video/screenshot on failure.
Tasks
- Update AGENTS.md/SETUP.md: how to choose modes/toolboxes; fake CLI for testing.
- Provide
ROLLBACK_phase1.sqlthat recreates pre-Phase-0 schema (copy-table pattern). - Release notes template with risks and manual steps.
Objective
- Provide a reliable, full-screen terminal interface that runs the Amp TUI inside the app, with a clean toggle between Chat and TUI modes, preserving state.
- Support two execution profiles: Development (local CLI) and Production (system amp), obeying EnvKind gating and existing env composition (agent_mode, toolbox).
Architecture & design
Backend (Tauri / Rust)
- Implement a TuiSession manager using portable-pty for a real PTY. Each session stores:
- master pty handle (for resize), child process handle, writer, and a reader thread that emits bytes.
- IPC commands:
- cmd_start_tui(profile, variant_id?, cwd?, cols, rows, env?) -> session_id
- cmd_write_stdin(session_id, utf8_chunk)
- cmd_resize(session_id, cols, rows)
- cmd_kill(session_id)
- Resolve entrypoint by profile with gating:
- EnvKind!=Local → always use system amp
- EnvKind==Local → Dev profile uses local CLI; Prod profile uses system amp
- Spawn with compose_runtime_env() so agent_mode/toolbox propagate identically to chat runs.
Frontend (React)
- Add a TerminalProvider that owns session lifecycle so unmounting the view does not kill the PTY.
- TerminalView uses xterm.js + fit addon; subscribes to tauri events for PTY data; forwards keystrokes via IPC.
- Add tabs or a top-level toggle: Chat ↔ Terminal. Preserve scrollback and focus when switching.
- Implement window and container resize → fit addon → cmd_resize.
Tasks
- Backend: add portable-pty; implement TuiSession manager and IPC commands; integrate compose_runtime_env(SpawnKind::Tui).
- Frontend: install xterm + fit addon; implement TerminalProvider and TerminalView; add Chat/Terminal tabs with last-tab persistence.
- Gating: hide profile selector when EnvKind!=Local; backend coerces to ProdBin regardless of requested profile.
- Deferred: optional Dev override for local CLI path (post-Phase 1).
- UX: toolbar actions (Ctrl-C, Clear, Copy/Paste); show active profile and toolbox/mode chips.
- Metrics: tui_launch, tui_exit_code, tui_duration_ms; include agent_mode/toolbox in context.
Acceptance criteria
- TUI renders correctly; resizing works; switching tabs preserves state and process.
- Local: Dev profile runs local CLI; Prod profile runs system amp. Prod/CI: always system amp.
- Env composition matches chat runs; Local observes agent_mode/toolbox; Prod coerces to defaults.
- Clean exit on window close or kill; no orphaned PTYs.
Tests
- Unit (Rust): resolve entry (EnvKind×Profile), stdin/out piping, resize, exit.
- Integration (shim): PTY launch prints marker; bytes visible in UI buffer; send "q" to exit.
- E2E (WebView/Playwright): open Terminal, assert first ANSI frame; toggle Chat ↔ Terminal retains buffer; Prod build hides profile selector.
Risks & mitigations
- Windows PTY quirks → use portable-pty winpty backend; add Windows CI smoke test.
- Resource leaks → Drop impl on TuiSession; ensure cmd_kill on window close.
- Performance → set scrollback=10k; lazy render when hidden; throttle resize events.
Rollback
- Feature flag enable_terminal_view=false hides the tab and prevents TUI codepaths from executing. No DB changes required.
- Agent Mode selectable and propagated in both Prod and Local; Toolbox path behind flag.
- Env/flags verified by unit/integration; E2E at end reflects UI + metrics.
- CI gates in place; coverage ≥ 90% on new code.
- Rollback scripts/docs present.
Unit (Rust)
agent_mode.rs: coercion rules.runtime_env.rs: env composition matrix.toolbox_resolver.rs: symlink/copy logic, security checks.
Unit (TS/Vitest)
- UI gating and reducers; preset/variant persistence.
Integration (TS + Node shim)
- Spawn variants against fake CLI; assert argv/env; toolbox tool discovery.
E2E (WebView/Playwright)
- Agent Mode dropdown, Toolbox picker, metrics exposure, production coercion.
- TUI large terminal view toggle + first-frame render sanity; production coercion verified.
Coverage
pnpm test -- --coverage
# For Rust backend (if applicable):
# cargo test --all --locked- Schema incompatibility → Additive nullable columns; transactional migrations; documented rollback.
- Env leakage to Prod → Strict
EnvKindchecks; unit tests; CI validation step builds prod and inspects behavior. - Symlink resolver security → Canonicalization, deny traversal, file count/size limits; Windows fallback to copy.
- E2E flakiness → Headless, retry once, record artifacts.
- Feature flags: disable
agent_modesandtoolboxesat runtime; hide UI controls. - Database: apply
downmigration script rebuilding table without new columns (copy-table strategy). - Release notes include manual rollback steps and verification checklist.
Example Preset JSON
{
"models": ["gpt-5"],
"alloy": "standard",
"agentMode": "geppetto:main",
"tools": ["git", "shell", "browser"],
"toolboxPath": "~/amp_tools/my_project_tools",
"limits": { "max_turns": 30, "max_tokens": 300000 }
}Matrix with agentMode axis (YAML)
variants:
agentMode: ["default", "geppetto:main", "claudetto:main"]
seed: [11, 42]
limits:
max_parallel: 4Spawn env (pseudo-Rust)
let mut cmd = Command::new(amp_path);
cmd.env("AMP_TOOLBOX", resolved_toolbox_dir);
cmd.env("PATH", format!("{}:{}", resolved_toolbox_bin, current_path));
if env.kind == EnvKind::Local {
cmd.env("AMP_EXPERIMENTAL_AGENT_MODE", agent_mode_str);
}
if let Some(cfg) = mcp_cfg_path { cmd.env("AMP_MCP_CONFIG", cfg); }Tool template (bash)
#!/usr/bin/env bash
set -euo pipefail
args=$(cat)
printf '{"ok":true,"message":"hello"}'- Phase 0
- M0.1 Feature flags & environment plumbing
- M0.2 Schema & migrations
- M0.3 Runtime spawn wrapper skeleton
- M0.4 Fake Amp CLI shim + test harness
- M0.5 CI hardening and coverage gates
- Phase 1
- M1.1 Agent Mode end-to-end
- M1.2 Toolbox Resolver + env composition
- M1.3 Toolbox UI surfaces
- M1.4 Metrics + export updates (agent_mode done)
- M1.5 CI regression gates and E2E (deferred)
- M1.6 Docs + rollback hooks
- M1.7 Large TUI Terminal View (pty integration + toggle)