feat: remove Docker/Podman dependency with native runtime support#926
feat: remove Docker/Podman dependency with native runtime support#926nglmercer wants to merge 2 commits into
Conversation
Add a native process-based runtime that runs HelixDB without containers, enabling bare-metal execution. This is the foundation for future in-memory database features that need direct process control. Changes: - Add `Native` variant to `ContainerRuntime` enum in config.rs - Create `Runtime` enum dispatching between container and native runtimes - Implement `NativeManager` with PID-based process lifecycle management - Add `volumes_dir`/`instance_volume` helpers to `ProjectContext` - Update all commands (run, stop, restart, status, logs, prune, delete) to use `Runtime::for_project()` instead of `LocalRuntime::new()` - Fix dashboard.rs match exhaustiveness for new Native variant - Fix all clippy warnings in sdks (collapsible_if, derivable_impls, should_implement_trait)
| } | ||
| signal = tokio::signal::ctrl_c() => { | ||
| signal?; | ||
| crate::output::info("Stopping foreground local Helix instance"); | ||
| match tokio::time::timeout(Duration::from_secs(10), &mut wait).await { | ||
| Ok(Ok(_)) => {} | ||
| Ok(Err(e)) => return Err(eyre!("Failed to wait for process to stop: {e}")), | ||
| Err(_) => return Err(eyre!("Timed out waiting for process to stop")), | ||
| } | ||
| } |
There was a problem hiding this comment.
Native foreground: process left running after Ctrl-C timeout
When the user presses Ctrl-C, the process receives SIGINT along with the CLI. If it doesn't exit within 10 seconds, the Err(_) arm returns an error — but no signal is sent to the child and the process keeps running. Compare this to the container runtime's start_foreground, which explicitly calls remove_container (a force-stop) before waiting. Without an equivalent stop_process(instance_name) call here, a slow-to-exit native instance becomes an orphan that the CLI reports as failed while it continues consuming resources.
| let child = cmd | ||
| .spawn() | ||
| .map_err(|e| eyre!("Failed to start native process: {e}"))?; | ||
|
|
||
| self.save_pid(instance_name, child.id())?; | ||
|
|
||
| wait_ready(port)?; | ||
| Ok(()) |
There was a problem hiding this comment.
Orphan process if
save_pid fails after spawn
cmd.spawn() returns a Child, and then save_pid writes the PID to disk. If save_pid fails (e.g., a filesystem permission error on the workspace directory), the function returns an error — but the spawned process is already running and detached (Rust's std::process::Child does not kill on drop). The PID file is missing, so neither helix stop nor helix status will find the instance; the native process leaks with no CLI-level way to stop it. The child should be explicitly killed if save_pid returns an error.
| ContainerRuntime::Podman => "host.containers.internal".to_string(), | ||
| ContainerRuntime::Native => "localhost".to_string(), | ||
| } |
There was a problem hiding this comment.
Dashboard container unreachable on macOS/Windows with native runtime
The comment above says "The dashboard is always a container (Docker/Podman), never native", yet "localhost" is returned for the Native arm. Inside a Docker or Podman container on macOS and Windows, localhost resolves to the container itself, not the host. So when container_runtime = "native", the HelixDB process is running directly on the host, but the dashboard container would try to reach it at localhost and fail to connect. On Linux with host networking this works, but on macOS/Windows users would see connection errors from the dashboard. Consider returning "host.docker.internal" (or the equivalent per the outer runtime responsible for the dashboard) instead, or explicitly blocking helix dashboard when running the native runtime.
| fn is_process_running(pid: u32) -> bool { | ||
| #[cfg(target_os = "linux")] | ||
| { | ||
| PathBuf::from(format!("/proc/{pid}")).exists() | ||
| } | ||
| #[cfg(not(target_os = "linux"))] | ||
| { | ||
| Command::new("kill") | ||
| .args(["-0", &pid.to_string()]) | ||
| .output() | ||
| .map_err(|e| eyre!("Failed to initialize local MinIO bucket: {e}"))?; | ||
|
|
||
| if output.status.success() { | ||
| return Ok(()); | ||
| } | ||
|
|
||
| last_stderr = String::from_utf8_lossy(&output.stderr).trim().to_string(); | ||
| thread::sleep(Duration::from_millis(500)); | ||
| .map(|o| o.status.success()) | ||
| .unwrap_or(false) | ||
| } | ||
|
|
||
| Err(eyre!( | ||
| "Timed out initializing local MinIO bucket {LOCAL_S3_BUCKET}:\n{last_stderr}" | ||
| )) | ||
| } |
There was a problem hiding this comment.
PID reuse risk may kill an unrelated process
is_process_running and then kill -TERM {pid} operate on a numeric PID without verifying that the process belongs to this Helix instance. On Linux the /proc/{pid} directory existing only confirms that a process with that PID is alive; on other platforms kill -0 has the same limitation. On long-running machines (or after system restarts where the PID file is stale), the recycled PID could point to an unrelated daemon, which stop_process would then SIGTERM/SIGKILL. Storing an additional process attribute (e.g., start-time or a process-group ID) and comparing it before sending signals would make this safe.
- Kill child process if save_pid fails after spawn (prevents orphan) - Send SIGKILL via PID on foreground Ctrl-C timeout (prevents runaway process) - Store process start time in PID file and verify before sending signals (prevents killing a recycled PID belonging to an unrelated process) - Add comment clarifying dashboard Native arm is dead code
|
will review |
Summary
Adds a native process-based runtime that runs HelixDB without Docker or Podman, enabling bare-metal execution. This is the foundation for future in-memory database features that need direct process control.
Changes
ContainerRuntime::Native— new variant in config.rs withis_native(),binary()="native",label()="Native"Runtimeenum — dispatches betweenContainer(LocalRuntime)andNative(NativeManager)viaRuntime::for_project()NativeManager— process lifecycle via PID files, SIGTERM/SIGKILL, log file output, TCP readiness probeProjectContext— addedvolumes_dir()andinstance_volume()helpersRuntime::for_project()Usage
Then
helix run devruns the binary directly instead of in a container.Backward Compatibility
Docker and Podman runtimes remain the default.
container_runtime = "docker"(the default) works exactly as before.Greptile Summary
This PR introduces a
ContainerRuntime::Nativevariant that bypasses Docker/Podman entirely, spawning the HelixDB binary as a plain OS process managed via PID files, SIGTERM/SIGKILL, and a log file. All CLI commands are updated to dispatch through a newRuntimeenum routing to eitherLocalRuntimeor the newNativeManager. The SDK changes are unrelated Clippy fixes.NativeManagermanages process lifecycle: spawns the native binary withHELIX_PORT/HELIX_DATA_DIR, writes a PID file, probes TCP readiness, and stops via SIGTERM → SIGKILL.Runtimeenum dispatches all CLI operations; all command files now useRuntime::for_project()instead of constructingLocalRuntimedirectly.NativeManager: ifsave_pidfails after a successfulspawn, the process leaks with no CLI-visible handle; and instart_foreground, if the process doesn't exit within 10 seconds of Ctrl-C, the CLI returns an error but sends no SIGKILL, leaving the process running.Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[helix run / stop / restart / status / logs / prune] --> B[Runtime::for_project] B --> C{container_runtime in helix.toml} C -- docker / podman --> D[LocalRuntime] C -- native --> E[NativeManager] D --> F[docker/podman run / rm / ps / logs] E --> G[spawn binary HELIX_PORT + HELIX_DATA_DIR] G --> H[save_pid to .helix/name/helix.pid] H -->|save_pid fails| I[process leaks no PID file] H -->|success| J[TCP readiness probe :port] J -->|ready| K[instance running] K --> L{helix stop} L --> M[read_pid SIGTERM wait 10s SIGKILL] M --> N[remove PID file] E --> O[start_foreground: inherit stdio] O --> P{Ctrl-C} P --> Q[wait 10s for exit] Q -->|timeout| R[returns error process still running] Q -->|exited| S[done]Reviews (1): Last reviewed commit: "feat: remove Docker/Podman dependency wi..." | Re-trigger Greptile