Skip to content

Commit 82d341d

Browse files
ymcrcatclaude
andauthored
fix(sandbox): try Docker socket before CLI binary check (#2467)
* fix(sandbox): try Docker socket before CLI binary check The sandbox detection checked `which docker` first and returned NotInstalled if the CLI binary was absent — even when the Docker daemon was reachable via a bind-mounted socket. This broke container-in-container deployments (e.g., Nomad shards with /var/run/docker.sock mounted) where bollard can talk to the daemon but no CLI is installed in the slim image. Reorder check_docker() to try connect_docker() (bollard socket ping) first. If the daemon responds, return Available immediately. The CLI check is now only used as a fallback for error-message quality when the socket connection fails. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(sandbox): skip slow daemon ping when Docker is clearly absent check_docker() called connect_docker() before checking whether Docker was even present, causing a 120s bollard timeout on hosts with an unreachable DOCKER_HOST and no Docker installation. Add a fast-path that checks for the docker binary, DOCKER_HOST env var, and socket files on disk before attempting the daemon ping. This preserves DinD support (bind-mounted socket, no CLI binary) while avoiding the latency regression for non-Docker hosts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(sandbox): add regression tests for check_docker fast-path Extract should_skip_daemon_ping() predicate from check_docker() and add unit tests covering all combinations: skip when no binary, no DOCKER_HOST, and no socket (the bug scenario); no skip when any of the three signals is present (DinD socket, DOCKER_HOST, CLI binary). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent d63601e commit 82d341d

2 files changed

Lines changed: 114 additions & 10 deletions

File tree

src/sandbox/container.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -683,7 +683,7 @@ pub async fn connect_docker() -> Result<Docker> {
683683
}
684684

685685
#[cfg(unix)]
686-
fn unix_socket_candidates() -> Vec<PathBuf> {
686+
pub(crate) fn unix_socket_candidates() -> Vec<PathBuf> {
687687
unix_socket_candidates_from_env(
688688
std::env::var_os("HOME").map(PathBuf::from),
689689
std::env::var_os("XDG_RUNTIME_DIR").map(PathBuf::from),

src/sandbox/detect.rs

Lines changed: 113 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
//! Proactive Docker detection with platform-specific guidance.
22
//!
3-
//! Checks whether Docker is both installed (binary on PATH) and running
4-
//! (daemon responding to ping), and provides platform-appropriate
5-
//! installation or startup instructions when it is not.
3+
//! First performs cheap filesystem checks (socket existence, `DOCKER_HOST`,
4+
//! PATH lookup) to fast-exit on hosts where Docker is clearly absent —
5+
//! avoiding bollard's 120 s daemon-ping timeout. Then attempts a direct
6+
//! socket connection via bollard (covers container-in-container deployments
7+
//! where the socket is bind-mounted but the CLI is absent), and falls back
8+
//! to a `which docker` PATH check for error-message quality. Provides
9+
//! platform-appropriate installation or startup instructions when Docker
10+
//! is not available.
611
//!
712
//! # Detection Limitations
813
//!
@@ -102,28 +107,48 @@ pub struct DockerDetection {
102107

103108
/// Check whether Docker is installed and running.
104109
///
105-
/// 1. Checks if `docker` binary exists on PATH
106-
/// 2. If found, tries to connect and ping the Docker daemon via `connect_docker()`
107-
/// 3. Returns `Available`, `NotInstalled`, or `NotRunning`
110+
/// 1. Fast path: if no Docker socket exists on the filesystem, `DOCKER_HOST`
111+
/// is unset, *and* the `docker` CLI binary is absent, returns `NotInstalled`
112+
/// immediately — without a daemon ping. This avoids a 120 s bollard timeout
113+
/// on hosts where Docker is clearly not present (see #P2).
114+
/// 2. Tries to connect and ping the Docker daemon directly via
115+
/// `connect_docker()` (bollard). This covers container-in-container (DinD)
116+
/// deployments where the socket is bind-mounted but the CLI binary is not
117+
/// installed inside the container.
118+
/// 3. If the daemon is unreachable and the binary is also missing, returns
119+
/// `NotInstalled`; otherwise `NotRunning`.
108120
pub async fn check_docker() -> DockerDetection {
109121
let platform = Platform::current();
122+
let binary_found = docker_binary_exists();
123+
let docker_host_set = std::env::var_os("DOCKER_HOST").is_some();
124+
let socket_found = any_docker_socket_exists();
110125

111-
// Step 1: Check if docker binary is on PATH
112-
if !docker_binary_exists() {
126+
// Fast path: no CLI binary, no DOCKER_HOST, and no socket file on disk.
127+
// Skip the daemon ping that would block on an unreachable host.
128+
if should_skip_daemon_ping(binary_found, docker_host_set, socket_found) {
113129
return DockerDetection {
114130
status: DockerStatus::NotInstalled,
115131
platform,
116132
};
117133
}
118134

119-
// Step 2: Try to connect to the daemon
135+
// Authoritative check: try to connect and ping the daemon via bollard.
136+
// Covers DinD (socket bind-mounted, no CLI binary) and standard installs.
120137
if crate::sandbox::connect_docker().await.is_ok() {
121138
return DockerDetection {
122139
status: DockerStatus::Available,
123140
platform,
124141
};
125142
}
126143

144+
// Daemon unreachable. Distinguish "not installed" from "not running".
145+
if !binary_found {
146+
return DockerDetection {
147+
status: DockerStatus::NotInstalled,
148+
platform,
149+
};
150+
}
151+
127152
// Windows fallback: if the named pipe probe fails but docker CLI can still
128153
// reach the daemon/server, treat Docker as available.
129154
#[cfg(windows)]
@@ -162,6 +187,38 @@ fn docker_binary_exists() -> bool {
162187
}
163188
}
164189

190+
/// Check if any well-known Docker socket file exists on disk.
191+
///
192+
/// This is a cheap filesystem probe (no daemon ping) used to decide whether
193+
/// it is worth attempting the slower `connect_docker()` call. Covers the
194+
/// default `/var/run/docker.sock` plus the user-space candidates that
195+
/// `connect_docker()` already tries.
196+
fn any_docker_socket_exists() -> bool {
197+
#[cfg(unix)]
198+
{
199+
use std::path::PathBuf;
200+
201+
// Default socket that bollard's connect_with_local_defaults() checks.
202+
if PathBuf::from("/var/run/docker.sock").exists() {
203+
return true;
204+
}
205+
206+
// Same user-space candidates that connect_docker() iterates.
207+
crate::sandbox::container::unix_socket_candidates()
208+
.iter()
209+
.any(|sock| sock.exists())
210+
}
211+
212+
#[cfg(windows)]
213+
{
214+
// On Windows, bollard probes the named pipe `//./pipe/docker_engine`.
215+
// `Path::exists()` doesn't work for named pipes, so we conservatively
216+
// return true — the binary-exists check is the primary fast-path on
217+
// Windows.
218+
true
219+
}
220+
}
221+
165222
#[cfg(windows)]
166223
fn docker_cli_daemon_reachable() -> bool {
167224
let stdout = std::process::Stdio::null();
@@ -188,6 +245,12 @@ fn docker_cli_daemon_reachable() -> bool {
188245
.is_ok_and(|s| s.success())
189246
}
190247

248+
/// Returns `true` when we can confidently say Docker is not installed without
249+
/// performing a daemon ping. All three signals must be absent.
250+
fn should_skip_daemon_ping(binary_found: bool, docker_host_set: bool, socket_found: bool) -> bool {
251+
!binary_found && !docker_host_set && !socket_found
252+
}
253+
191254
#[cfg(test)]
192255
mod tests {
193256
use super::*;
@@ -232,4 +295,45 @@ mod tests {
232295
DockerStatus::Disabled => panic!("check_docker should never return Disabled"),
233296
}
234297
}
298+
299+
// --- Regression tests for the fast-path that skips the daemon ping ---
300+
301+
#[test]
302+
fn skip_ping_when_no_binary_no_host_no_socket() {
303+
// The bug: connect_docker() was called unconditionally, blocking for
304+
// 120 s on hosts with an unreachable DOCKER_HOST and no Docker.
305+
assert!(should_skip_daemon_ping(false, false, false));
306+
}
307+
308+
#[test]
309+
fn no_skip_when_docker_host_set() {
310+
// DOCKER_HOST points somewhere — must attempt the ping even without
311+
// a binary (could be a remote Docker host).
312+
assert!(!should_skip_daemon_ping(false, true, false));
313+
}
314+
315+
#[test]
316+
fn no_skip_when_socket_exists() {
317+
// Socket on disk (DinD bind-mount) — must ping even without the CLI.
318+
assert!(!should_skip_daemon_ping(false, false, true));
319+
}
320+
321+
#[test]
322+
fn no_skip_when_binary_found() {
323+
// CLI binary present — Docker may be installed but daemon stopped.
324+
assert!(!should_skip_daemon_ping(true, false, false));
325+
}
326+
327+
#[test]
328+
fn no_skip_when_all_signals_present() {
329+
assert!(!should_skip_daemon_ping(true, true, true));
330+
}
331+
332+
#[cfg(unix)]
333+
#[test]
334+
fn any_docker_socket_nonexistent_path_returns_false() {
335+
// Sanity: a path that doesn't exist should not count as a socket.
336+
use std::path::PathBuf;
337+
assert!(!PathBuf::from("/tmp/definitely-not-a-docker-socket.sock").exists());
338+
}
235339
}

0 commit comments

Comments
 (0)