Skip to content

Commit f47e41a

Browse files
fix: scope-intercept signal handling for long-lived workloads (#309)
## Why `scope-intercept` is sometimes used as a shebang on long-running scripts to enable self-healing known-error fix-and-retry. We had a report that Ctrl+C stopped working in that configuration: services started under overmind (nginx, etc.) kept running, the overmind socket was orphaned, and the next start failed with an "overmind previously crashed" prompt. The terminal sends SIGINT to every member of the foreground process group — both `scope-intercept` and the wrapped `env -S bash`. With no signal handler, `scope-intercept` followed the default and aborted immediately. Its captured stdout/stderr pipes closed, and the child died with SIGPIPE before its own SIGINT trap (and overmind's shutdown logic) could complete. `scope-intercept` was originally designed for short-lived setup scripts, but using it as a shebang for long-running interactive processes is a strong fit *if* it cooperates with the child's lifecycle. This PR makes it cooperate. ## What changed **`src/bin/scope-intercept.rs`** — At the top of `run_command`, install Tokio Unix signal listeners for `SIGINT`, `SIGTERM`, and `SIGHUP`. Their only job is to consume the default-terminate behavior so `scope-intercept` stays alive while the child runs its cleanup. No manual forwarding is needed — the OS already delivered the signal to the child via the shared process group. When the interrupt flag is set and the child exited non-zero, skip the known-error analysis, retry, and bug-report flows. The user asked to quit, not to be quizzed. `OutputCapture::capture_output` and the `scope doctor` path are deliberately not touched — this fix is scoped to the intercept binary. **`Cargo.toml`** — Adds `libc` as a dev-dep so the new test can call `setsid`/`killpg`. ## Tests **`tests/scope_intercept.rs::test_intercept_forwards_sigint_and_waits_for_child_cleanup`**: - Writes a `trap.sh` that traps SIGINT, writes `cleanup.done`, and sleeps. - Spawns `scope-intercept -- bash trap.sh` in its own session/process group via `pre_exec(setsid)`. - Waits for a `ready` marker, then sends SIGINT to the *group* via `killpg` — this is what a terminal does on Ctrl+C. - Asserts: the child's trap ran (`cleanup.done` exists with the expected contents), `scope-intercept` exited within 10s, and the exit code is 130. Verified the test fails on `main` (without the fix, exit status is `unix_wait_status(2)` — `scope-intercept` killed mid-flight by SIGINT) and passes with the fix. All 8 existing intercept tests still pass; full suite (177 tests) green. ## Test plan - [x] `cargo test --test scope_intercept` — 8/8 pass - [x] `cargo test` — 177/177 pass - [x] `cargo fmt --check` clean --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent d8a78fc commit f47e41a

4 files changed

Lines changed: 145 additions & 0 deletions

File tree

Cargo.lock

Lines changed: 14 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,10 @@ which = { version = "8.0", features = ["regex"] }
8080
assert_cmd = "2.2.1"
8181
assert_fs = "1.1.3"
8282
escargot = "0.5.15"
83+
nix = { version = "0.29", features = ["signal", "process"] }
8384
predicates = "3.1.4"
8485
tempfile = "3.27"
86+
wait-timeout = "0.2"
8587

8688
[build-dependencies]
8789
anyhow = "1.0.102"

src/bin/scope-intercept.rs

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@ use human_panic::setup_panic;
66
use std::env;
77
use std::io::Cursor;
88
use std::sync::Arc;
9+
use std::sync::atomic::{AtomicBool, Ordering};
910
use tokio::io::BufReader;
11+
use tokio::signal::unix::{SignalKind, signal};
1012
use tracing::{Level, enabled, error, info, warn};
1113

1214
/// A wrapper CLI that can be used to capture output from a program, check if there are known errors
@@ -76,6 +78,26 @@ async fn run_command(opts: Cli) -> anyhow::Result<i32> {
7678
let current_dir = std::env::current_dir()?;
7779
let path = env::var("PATH").unwrap_or_default();
7880

81+
// SIGINT/SIGTERM/SIGHUP are delivered to every member of our foreground
82+
// process group, including the child, so its own cleanup traps run. We
83+
// just need to keep ourselves alive long enough for the child to finish
84+
// shutting down — otherwise our captured stdout/stderr pipes close and
85+
// the child dies with SIGPIPE before its trap can complete.
86+
let interrupted = Arc::new(AtomicBool::new(false));
87+
for kind in [
88+
SignalKind::interrupt(),
89+
SignalKind::terminate(),
90+
SignalKind::hangup(),
91+
] {
92+
let mut stream = signal(kind)?;
93+
let interrupted = interrupted.clone();
94+
tokio::spawn(async move {
95+
while stream.recv().await.is_some() {
96+
interrupted.store(true, Ordering::SeqCst);
97+
}
98+
});
99+
}
100+
79101
let capture = OutputCapture::capture_output(CaptureOpts {
80102
working_dir: &current_dir,
81103
args: &command,
@@ -93,6 +115,13 @@ async fn run_command(opts: Cli) -> anyhow::Result<i32> {
93115
return Ok(exit_code);
94116
}
95117

118+
// The user explicitly asked to quit — don't surprise them with
119+
// known-error prompts, retry, or bug-report offers. Just propagate the
120+
// child's exit code (typically 130 for SIGINT, 143 for SIGTERM).
121+
if interrupted.load(Ordering::SeqCst) {
122+
return Ok(exit_code);
123+
}
124+
96125
error!(target: "user", "Command failed, checking for a known error");
97126
let found_config = opts.config_options.load_config().await.unwrap_or_else(|e| {
98127
error!(target: "user", "Unable to load configs from disk: {:?}", e);

tests/scope_intercept.rs

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
use assert_fs::fixture::{FileWriteStr, PathChild};
2+
use nix::sys::signal::{Signal, killpg};
3+
use nix::unistd::{Pid, setsid};
24
use predicates::boolean::PredicateBooleanExt;
35
use predicates::prelude::predicate;
6+
use std::os::unix::process::CommandExt;
7+
use std::process::Command as StdCommand;
8+
use std::time::{Duration, Instant};
9+
use wait_timeout::ChildExt;
410

511
#[allow(dead_code)]
612
mod common;
@@ -158,3 +164,97 @@ fn test_intercept_no_known_errors_match() {
158164

159165
helper.clean_work_dir();
160166
}
167+
168+
// When scope-intercept wraps a long-running interactive process (e.g.
169+
// used as a shebang on a server start script), Ctrl+C must let the
170+
// child run its own SIGINT trap and exit cleanly. Before this fix,
171+
// scope-intercept terminated immediately on SIGINT, closing the
172+
// captured stdout/stderr pipes and killing the child with SIGPIPE
173+
// before its trap could complete.
174+
#[test]
175+
fn test_intercept_forwards_sigint_and_waits_for_child_cleanup() {
176+
let helper = ScopeTestHelper::new(
177+
"test_intercept_forwards_sigint_and_waits_for_child_cleanup",
178+
"empty",
179+
);
180+
181+
let work_dir = helper.work_dir.path().to_path_buf();
182+
let cleanup_marker = work_dir.join("cleanup.done");
183+
let ready_marker = work_dir.join("ready");
184+
185+
helper
186+
.work_dir
187+
.child("trap.sh")
188+
.write_str(
189+
"#!/bin/bash\n\
190+
trap 'echo trapped > cleanup.done; exit 130' INT\n\
191+
touch ready\n\
192+
# sleep in short increments so the trap fires promptly\n\
193+
for _ in $(seq 1 30); do sleep 1; done\n",
194+
)
195+
.unwrap();
196+
197+
let intercept_bin = assert_cmd::cargo::cargo_bin("scope-intercept");
198+
let mut command = StdCommand::new(intercept_bin);
199+
command
200+
.current_dir(&work_dir)
201+
.env("NO_COLOR", "1")
202+
.args(["--", "bash", "trap.sh"]);
203+
// Put scope-intercept in its own session/process group so killpg below
204+
// hits both it and the wrapped bash, mirroring how a terminal delivers
205+
// Ctrl+C to the foreground process group.
206+
//
207+
// SAFETY: pre_exec runs the closure between fork and exec, so it must
208+
// call only async-signal-safe functions. setsid(2) is on the POSIX
209+
// async-signal-safe list and touches no shared state in the parent.
210+
unsafe {
211+
command.pre_exec(|| setsid().map(|_| ()).map_err(std::io::Error::from));
212+
}
213+
let mut child = command.spawn().expect("failed to spawn scope-intercept");
214+
215+
// Poll for the ready marker. There's no good cross-process "file
216+
// appeared" primitive without bringing in inotify/kqueue, so we
217+
// sleep between checks to avoid burning a core.
218+
let deadline = Instant::now() + Duration::from_secs(5);
219+
while !ready_marker.exists() {
220+
if Instant::now() > deadline {
221+
child.kill().ok();
222+
panic!(
223+
"trap.sh never wrote ready marker; scope-intercept may not be running the script"
224+
);
225+
}
226+
std::thread::sleep(Duration::from_millis(50));
227+
}
228+
229+
let pgid = Pid::from_raw(child.id() as i32);
230+
killpg(pgid, Signal::SIGINT).expect("killpg(pgid, SIGINT) failed");
231+
232+
// If scope-intercept doesn't shut down within 10s, the bug is back:
233+
// it's hanging instead of letting the child run its trap and
234+
// propagating the exit.
235+
let status = match child
236+
.wait_timeout(Duration::from_secs(10))
237+
.expect("wait_timeout failed")
238+
{
239+
Some(status) => status,
240+
None => {
241+
child.kill().ok();
242+
panic!("scope-intercept did not exit within 10s after SIGINT");
243+
}
244+
};
245+
246+
assert!(
247+
cleanup_marker.exists(),
248+
"child SIGINT trap did not run — cleanup.done was never created"
249+
);
250+
let body = std::fs::read_to_string(&cleanup_marker).unwrap();
251+
assert_eq!(body.trim(), "trapped");
252+
253+
assert_eq!(
254+
status.code(),
255+
Some(130),
256+
"scope-intercept should propagate the child's exit code (130 for SIGINT), got {status:?}"
257+
);
258+
259+
helper.clean_work_dir();
260+
}

0 commit comments

Comments
 (0)