Skip to content

Commit edaa1bd

Browse files
BTForITclaudenjbrake
authored
feat(cli): aoe session restart --all (#910)
* feat(cli): add --all flag scaffold to restart subcommand Introduces RestartArgs with optional identifier, --all switch (mutually exclusive with identifier), and --parallel concurrency cap (default 3). Orchestrator is stubbed; only arg parsing is wired up. Per njbrake's spec in #855. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(cli): sequential restart-all orchestrator Implements `aoe session restart --all`: - pick_targets_for_restart_all() skips Deleting/Creating sessions (mid-transition; restarting them races their existing state machine). - Orchestrator loads sessions, mutates Vec<Instance> in memory, saves once at end. No per-instance persistence inside the loop — this matters when we add concurrency next. - Per-instance failures collected and surfaced; exit code reflects failure count. Concurrency cap (`--parallel`) accepted but currently unused; sequential loop is a known-good baseline before adding the semaphore-bounded parallel path. Tests cover the filter (Deleting/Creating excluded, all other states included) and arg parsing (already in place). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(cli): bound restart-all concurrency with semaphore + JoinSet Replaces the sequential loop with a parallel pool gated by tokio::sync::Semaphore (default 3, configurable via --parallel). Each target is cloned into its worker; the worker calls restart_with_size on a blocking thread (since restart_with_size spawns processes and sleeps), and the mutated copy is written back to instances[idx] when the worker returns. The shared Vec<Instance> is never touched by workers, only by the dispatcher loop, and sessions.json is still saved exactly once after every worker has joined. This is what njbrake's #855 spec called out: don't persist per-instance inside a parallel pool, or reads-and-writes will race. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(cli): regenerate reference for restart --all Auto-generated by `cargo xtask gen-docs` after adding --all and --parallel flags to `aoe session restart`. CI enforces this is in sync with clap help. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(cli): rehydrate source_profile before session restart Instance::source_profile is #[serde(skip_serializing)], so storage-loaded instances always come back with it blank. Every other load site (start_session in this file, the TUI dashboard, the web server) repopulates it from the active profile right after Storage::load_with_groups so that start_with_size_opts can resolve sandbox.environment and on_launch hooks from the correct profile. The single-session restart path was missed when PR #812 fixed the analogous web bug, and the new restart_all_sessions inherited that miss. Without this, `aoe -p foo session restart [<id>|--all]` against a profile that is not the user's globally configured default falls back to the default profile's sandbox env vars and on-launch hooks (cf. build_docker_env_args -> resolved_sandbox_config -> effective_profile, plus Instance::effective_profile in start_with_size_opts). The test_build_docker_env_args_uses_passed_profile_not_global_default regression test in src/session/environment.rs documents the same class of bug from the web flow ("personal profile's GH_TOKEN was ignored when launching from the web app"). Fixes the latent miss in restart_session and the new bug in restart_all_sessions in one place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: njbrake <nathan@mozilla.ai> Co-authored-by: Nathan Brake (Mozilla.ai) <33383515+njbrake@users.noreply.github.com>
1 parent f4a1aa5 commit edaa1bd

2 files changed

Lines changed: 261 additions & 7 deletions

File tree

docs/cli/reference.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ Manage session lifecycle (start, stop, attach, etc.)
209209

210210
* `start` — Start a session's tmux process
211211
* `stop` — Stop session process
212-
* `restart` — Restart session
212+
* `restart` — Restart session (or all sessions with `--all`)
213213
* `attach` — Attach to session interactively
214214
* `show` — Show session details
215215
* `rename` — Rename a session
@@ -245,13 +245,20 @@ Stop session process
245245

246246
## `aoe session restart`
247247

248-
Restart session
248+
Restart session (or all sessions with `--all`)
249249

250-
**Usage:** `aoe session restart <IDENTIFIER>`
250+
**Usage:** `aoe session restart [OPTIONS] [IDENTIFIER]`
251251

252252
###### **Arguments:**
253253

254-
* `<IDENTIFIER>` — Session ID or title
254+
* `<IDENTIFIER>` — Session ID or title (required unless `--all` is passed)
255+
256+
###### **Options:**
257+
258+
* `--all` — Restart every session in the active profile. Useful after `aoe update`, after editing `sandbox.environment`, after a Docker hiccup, or after changing a hook. Mutually exclusive with `identifier`
259+
* `--parallel <PARALLEL>` — Concurrency cap for `--all`. Restarting many sandboxed sessions in parallel pressures dockerd, so the default is intentionally modest. Ignored when `--all` is not set
260+
261+
Default value: `3`
255262

256263

257264

src/cli/session.rs

Lines changed: 250 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ pub enum SessionCommands {
1414
/// Stop session process
1515
Stop(SessionIdArgs),
1616

17-
/// Restart session
18-
Restart(SessionIdArgs),
17+
/// Restart session (or all sessions with `--all`)
18+
Restart(RestartArgs),
1919

2020
/// Attach to session interactively
2121
Attach(SessionIdArgs),
@@ -42,6 +42,25 @@ pub struct SessionIdArgs {
4242
identifier: String,
4343
}
4444

45+
#[derive(Args)]
46+
pub struct RestartArgs {
47+
/// Session ID or title (required unless `--all` is passed)
48+
pub identifier: Option<String>,
49+
50+
/// Restart every session in the active profile. Useful after
51+
/// `aoe update`, after editing `sandbox.environment`, after a
52+
/// Docker hiccup, or after changing a hook. Mutually exclusive
53+
/// with `identifier`.
54+
#[arg(long, conflicts_with = "identifier")]
55+
pub all: bool,
56+
57+
/// Concurrency cap for `--all`. Restarting many sandboxed
58+
/// sessions in parallel pressures dockerd, so the default is
59+
/// intentionally modest. Ignored when `--all` is not set.
60+
#[arg(long, default_value_t = 3)]
61+
pub parallel: usize,
62+
}
63+
4564
#[derive(Args)]
4665
pub struct RenameArgs {
4766
/// Session ID or title (optional, auto-detects in tmux)
@@ -131,7 +150,7 @@ pub async fn run(profile: &str, command: SessionCommands) -> Result<()> {
131150
match command {
132151
SessionCommands::Start(args) => start_session(profile, args).await,
133152
SessionCommands::Stop(args) => stop_session(profile, args).await,
134-
SessionCommands::Restart(args) => restart_session(profile, args).await,
153+
SessionCommands::Restart(args) => restart_session_dispatch(profile, args).await,
135154
SessionCommands::Attach(args) => attach_session(profile, args).await,
136155
SessionCommands::Show(args) => show_session(profile, args).await,
137156
SessionCommands::Capture(args) => capture_session(profile, args).await,
@@ -205,6 +224,123 @@ async fn stop_session(profile: &str, args: SessionIdArgs) -> Result<()> {
205224
Ok(())
206225
}
207226

227+
async fn restart_session_dispatch(profile: &str, args: RestartArgs) -> Result<()> {
228+
if args.all {
229+
return restart_all_sessions(profile, args.parallel).await;
230+
}
231+
let identifier = args
232+
.identifier
233+
.ok_or_else(|| anyhow::anyhow!("session identifier required (or pass --all)"))?;
234+
restart_session(profile, SessionIdArgs { identifier }).await
235+
}
236+
237+
async fn restart_all_sessions(profile: &str, parallel: usize) -> Result<()> {
238+
let storage = Storage::new(profile)?;
239+
let (mut instances, groups) = storage.load_with_groups()?;
240+
241+
let target_ids = pick_targets_for_restart_all(&instances);
242+
if target_ids.is_empty() {
243+
println!("No sessions to restart in profile '{}'.", profile);
244+
return Ok(());
245+
}
246+
247+
let total = target_ids.len();
248+
let size = crate::terminal::get_size();
249+
let parallel = parallel.max(1);
250+
251+
// Clone each target into its worker; we'll write the (mutated) copy back
252+
// by index after the worker returns. Workers never touch the shared Vec.
253+
// `source_profile` is runtime-only (skip_serializing) so storage-loaded
254+
// instances always come back blank; rehydrate it from the storage profile
255+
// so start-time config resolution honors the right profile's overrides
256+
// (sandbox.environment, on_launch hooks, etc.).
257+
let mut targets: Vec<(usize, crate::session::Instance)> = Vec::with_capacity(total);
258+
for id in &target_ids {
259+
if let Some(idx) = instances.iter().position(|i| &i.id == id) {
260+
let mut clone = instances[idx].clone();
261+
clone.source_profile = profile.to_string();
262+
targets.push((idx, clone));
263+
}
264+
}
265+
266+
let semaphore = std::sync::Arc::new(tokio::sync::Semaphore::new(parallel));
267+
let mut join_set: tokio::task::JoinSet<(
268+
usize,
269+
String,
270+
Option<crate::session::Instance>,
271+
Result<()>,
272+
)> = tokio::task::JoinSet::new();
273+
274+
for (idx, mut inst) in targets {
275+
let permit_sem = semaphore.clone();
276+
join_set.spawn(async move {
277+
let _permit = permit_sem
278+
.acquire_owned()
279+
.await
280+
.expect("semaphore not closed");
281+
let title = inst.title.clone();
282+
let res = tokio::task::spawn_blocking(move || {
283+
let result = inst.restart_with_size(size);
284+
(inst, result)
285+
})
286+
.await;
287+
match res {
288+
Ok((inst, result)) => (idx, title, Some(inst), result),
289+
Err(join_err) => (
290+
idx,
291+
title,
292+
None,
293+
Err(anyhow::anyhow!("worker panicked: {}", join_err)),
294+
),
295+
}
296+
});
297+
}
298+
299+
let mut succeeded: Vec<String> = Vec::new();
300+
let mut failed: Vec<(String, String)> = Vec::new();
301+
302+
while let Some(joined) = join_set.join_next().await {
303+
let (idx, title, inst_opt, result) =
304+
joined.expect("JoinSet shouldn't panic on join itself");
305+
if let Some(inst) = inst_opt {
306+
instances[idx] = inst;
307+
}
308+
match result {
309+
Ok(()) => succeeded.push(title),
310+
Err(e) => failed.push((title, e.to_string())),
311+
}
312+
}
313+
314+
let group_tree = GroupTree::new_with_groups(&instances, &groups);
315+
storage.save_with_groups(&instances, &group_tree)?;
316+
317+
println!("✓ Restarted {}/{} sessions:", succeeded.len(), total);
318+
for title in &succeeded {
319+
println!(" · {}", title);
320+
}
321+
if !failed.is_empty() {
322+
println!("✗ {} failed:", failed.len());
323+
for (title, err) in &failed {
324+
println!(" · {}: {}", title, err);
325+
}
326+
bail!("{} session(s) failed to restart", failed.len());
327+
}
328+
329+
Ok(())
330+
}
331+
332+
/// Sessions in `Deleting` or `Creating` are mid-transition; restarting them
333+
/// would race the deletion/boot path. Everything else is fair game; agents
334+
/// have their own resume-or-restart logic on the next start.
335+
fn pick_targets_for_restart_all(instances: &[crate::session::Instance]) -> Vec<String> {
336+
use crate::session::Status;
337+
instances
338+
.iter()
339+
.filter(|i| !matches!(i.status, Status::Deleting | Status::Creating))
340+
.map(|i| i.id.clone())
341+
.collect()
342+
}
343+
208344
async fn restart_session(profile: &str, args: SessionIdArgs) -> Result<()> {
209345
let storage = Storage::new(profile)?;
210346
let (mut instances, groups) = storage.load_with_groups()?;
@@ -218,6 +354,10 @@ async fn restart_session(profile: &str, args: SessionIdArgs) -> Result<()> {
218354
})
219355
.ok_or_else(|| anyhow::anyhow!("Session not found: {}", args.identifier))?;
220356

357+
// `source_profile` is runtime-only (skip_serializing) so storage-loaded
358+
// instances always come back blank; rehydrate it from the storage profile
359+
// so restart-time config resolution honors the right profile's overrides.
360+
instances[idx].source_profile = profile.to_string();
221361
instances[idx].restart_with_size(crate::terminal::get_size())?;
222362
let title = instances[idx].title.clone();
223363

@@ -540,3 +680,110 @@ async fn set_session_id(profile: &str, args: SetSessionIdArgs) -> Result<()> {
540680
}
541681
Ok(())
542682
}
683+
684+
#[cfg(test)]
685+
mod restart_args_tests {
686+
use super::SessionCommands;
687+
use clap::Parser;
688+
689+
#[derive(Parser)]
690+
struct Cli {
691+
#[command(subcommand)]
692+
cmd: SessionCommands,
693+
}
694+
695+
#[test]
696+
fn restart_with_identifier_still_parses() {
697+
let cli = Cli::try_parse_from(["aoe", "restart", "claude-3"])
698+
.expect("identifier-only must parse");
699+
match cli.cmd {
700+
SessionCommands::Restart(args) => {
701+
assert!(!args.all);
702+
assert_eq!(args.identifier.as_deref(), Some("claude-3"));
703+
assert_eq!(args.parallel, 3);
704+
}
705+
_ => panic!("wrong subcommand"),
706+
}
707+
}
708+
709+
#[test]
710+
fn restart_all_alone_parses() {
711+
let cli = Cli::try_parse_from(["aoe", "restart", "--all"]).expect("--all alone must parse");
712+
match cli.cmd {
713+
SessionCommands::Restart(args) => {
714+
assert!(args.all);
715+
assert!(args.identifier.is_none());
716+
assert_eq!(args.parallel, 3);
717+
}
718+
_ => panic!("wrong subcommand"),
719+
}
720+
}
721+
722+
#[test]
723+
fn restart_all_with_parallel_parses() {
724+
let cli = Cli::try_parse_from(["aoe", "restart", "--all", "--parallel", "5"])
725+
.expect("--all --parallel must parse");
726+
match cli.cmd {
727+
SessionCommands::Restart(args) => {
728+
assert!(args.all);
729+
assert_eq!(args.parallel, 5);
730+
}
731+
_ => panic!("wrong subcommand"),
732+
}
733+
}
734+
735+
#[test]
736+
fn restart_identifier_and_all_conflicts() {
737+
let result = Cli::try_parse_from(["aoe", "restart", "claude-3", "--all"]);
738+
assert!(
739+
result.is_err(),
740+
"passing both identifier and --all should error"
741+
);
742+
}
743+
}
744+
745+
#[cfg(test)]
746+
mod target_filter_tests {
747+
use super::pick_targets_for_restart_all;
748+
use crate::session::{Instance, Status};
749+
750+
fn instance_with_status(id: &str, status: Status) -> Instance {
751+
let mut inst = Instance::new(id, "/tmp");
752+
inst.id = id.to_string();
753+
inst.status = status;
754+
inst
755+
}
756+
757+
#[test]
758+
fn skips_deleting_and_creating() {
759+
let instances = vec![
760+
instance_with_status("running", Status::Running),
761+
instance_with_status("idle", Status::Idle),
762+
instance_with_status("stopped", Status::Stopped),
763+
instance_with_status("error", Status::Error),
764+
instance_with_status("waiting", Status::Waiting),
765+
instance_with_status("starting", Status::Starting),
766+
instance_with_status("unknown", Status::Unknown),
767+
instance_with_status("deleting", Status::Deleting),
768+
instance_with_status("creating", Status::Creating),
769+
];
770+
let mut picked = pick_targets_for_restart_all(&instances);
771+
picked.sort();
772+
let mut expected = vec![
773+
"error".to_string(),
774+
"idle".to_string(),
775+
"running".to_string(),
776+
"starting".to_string(),
777+
"stopped".to_string(),
778+
"unknown".to_string(),
779+
"waiting".to_string(),
780+
];
781+
expected.sort();
782+
assert_eq!(picked, expected);
783+
}
784+
785+
#[test]
786+
fn empty_input_yields_empty_targets() {
787+
assert!(pick_targets_for_restart_all(&[]).is_empty());
788+
}
789+
}

0 commit comments

Comments
 (0)