Skip to content

feat: remove Docker/Podman dependency with native runtime support#926

Open
nglmercer wants to merge 2 commits into
HelixDB:mainfrom
nglmercer:feat/native-runtime
Open

feat: remove Docker/Podman dependency with native runtime support#926
nglmercer wants to merge 2 commits into
HelixDB:mainfrom
nglmercer:feat/native-runtime

Conversation

@nglmercer

@nglmercer nglmercer commented May 31, 2026

Copy link
Copy Markdown

Summary

Adds a native process-based runtime that runs HelixDB without Docker or Podman, enabling bare-metal execution. This is the foundation for future in-memory database features that need direct process control.

Changes

  • ContainerRuntime::Native — new variant in config.rs with is_native(), binary()="native", label()="Native"
  • Runtime enum — dispatches between Container(LocalRuntime) and Native(NativeManager) via Runtime::for_project()
  • NativeManager — process lifecycle via PID files, SIGTERM/SIGKILL, log file output, TCP readiness probe
  • ProjectContext — added volumes_dir() and instance_volume() helpers
  • All commands updated — run, stop, restart, status, logs, prune, delete use Runtime::for_project()
  • Clippy fixes — collapsible_if, derivable_impls, should_implement_trait in sdks

Usage

# helix.toml
[project]
name = "myproject"
container_runtime = "native"

Then helix run dev runs the binary directly instead of in a container.

Backward Compatibility

Docker and Podman runtimes remain the default. container_runtime = "docker" (the default) works exactly as before.

Greptile Summary

This PR introduces a ContainerRuntime::Native variant that bypasses Docker/Podman entirely, spawning the HelixDB binary as a plain OS process managed via PID files, SIGTERM/SIGKILL, and a log file. All CLI commands are updated to dispatch through a new Runtime enum routing to either LocalRuntime or the new NativeManager. The SDK changes are unrelated Clippy fixes.

  • NativeManager manages process lifecycle: spawns the native binary with HELIX_PORT/HELIX_DATA_DIR, writes a PID file, probes TCP readiness, and stops via SIGTERM → SIGKILL.
  • Runtime enum dispatches all CLI operations; all command files now use Runtime::for_project() instead of constructing LocalRuntime directly.
  • Two correctness gaps in NativeManager: if save_pid fails after a successful spawn, the process leaks with no CLI-visible handle; and in start_foreground, if the process doesn't exit within 10 seconds of Ctrl-C, the CLI returns an error but sends no SIGKILL, leaving the process running.

Important Files Changed

Filename Overview
helix-cli/src/local_runtime.rs Adds NativeManager for process-based runtime; two correctness gaps: orphan process if save_pid fails after spawn, and no SIGKILL fallback in start_foreground on Ctrl-C timeout.
helix-cli/src/config.rs Adds ContainerRuntime::Native variant with is_native(), binary(), label(); clean backward-compatible change with Docker as default.
helix-cli/src/project.rs Adds volumes_dir() and instance_volume() helpers to ProjectContext; straightforward path helpers.
helix-cli/src/commands/dashboard.rs Adds Native arm returning localhost for dashboard host resolution; will fail on macOS/Windows where container localhost does not equal host.
helix-cli/src/commands/logs/mod.rs Switches to Runtime::for_project; error message still references docker/podman when native runtime is in use.
sdks/rust/src/dsl.rs Clippy fixes: derives Default for Order and EmitBehavior instead of manual impls, adds allow(clippy::should_implement_trait) on arithmetic methods.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[helix run / stop / restart / status / logs / prune] --> B[Runtime::for_project]
    B --> C{container_runtime in helix.toml}
    C -- docker / podman --> D[LocalRuntime]
    C -- native --> E[NativeManager]
    D --> F[docker/podman run / rm / ps / logs]
    E --> G[spawn binary HELIX_PORT + HELIX_DATA_DIR]
    G --> H[save_pid to .helix/name/helix.pid]
    H -->|save_pid fails| I[process leaks no PID file]
    H -->|success| J[TCP readiness probe :port]
    J -->|ready| K[instance running]
    K --> L{helix stop}
    L --> M[read_pid SIGTERM wait 10s SIGKILL]
    M --> N[remove PID file]
    E --> O[start_foreground: inherit stdio]
    O --> P{Ctrl-C}
    P --> Q[wait 10s for exit]
    Q -->|timeout| R[returns error process still running]
    Q -->|exited| S[done]
Loading

Reviews (1): Last reviewed commit: "feat: remove Docker/Podman dependency wi..." | Re-trigger Greptile

Greptile also left 4 inline comments on this PR.

Add a native process-based runtime that runs HelixDB without containers,
enabling bare-metal execution. This is the foundation for future
in-memory database features that need direct process control.

Changes:
- Add `Native` variant to `ContainerRuntime` enum in config.rs
- Create `Runtime` enum dispatching between container and native runtimes
- Implement `NativeManager` with PID-based process lifecycle management
- Add `volumes_dir`/`instance_volume` helpers to `ProjectContext`
- Update all commands (run, stop, restart, status, logs, prune, delete)
  to use `Runtime::for_project()` instead of `LocalRuntime::new()`
- Fix dashboard.rs match exhaustiveness for new Native variant
- Fix all clippy warnings in sdks (collapsible_if, derivable_impls,
  should_implement_trait)
Comment on lines +759 to +768
}
signal = tokio::signal::ctrl_c() => {
signal?;
crate::output::info("Stopping foreground local Helix instance");
match tokio::time::timeout(Duration::from_secs(10), &mut wait).await {
Ok(Ok(_)) => {}
Ok(Err(e)) => return Err(eyre!("Failed to wait for process to stop: {e}")),
Err(_) => return Err(eyre!("Timed out waiting for process to stop")),
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Native foreground: process left running after Ctrl-C timeout

When the user presses Ctrl-C, the process receives SIGINT along with the CLI. If it doesn't exit within 10 seconds, the Err(_) arm returns an error — but no signal is sent to the child and the process keeps running. Compare this to the container runtime's start_foreground, which explicitly calls remove_container (a force-stop) before waiting. Without an equivalent stop_process(instance_name) call here, a slow-to-exit native instance becomes an orphan that the CLI reports as failed while it continues consuming resources.

Comment thread helix-cli/src/local_runtime.rs Outdated
Comment on lines +703 to +710
let child = cmd
.spawn()
.map_err(|e| eyre!("Failed to start native process: {e}"))?;

self.save_pid(instance_name, child.id())?;

wait_ready(port)?;
Ok(())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Orphan process if save_pid fails after spawn

cmd.spawn() returns a Child, and then save_pid writes the PID to disk. If save_pid fails (e.g., a filesystem permission error on the workspace directory), the function returns an error — but the spawned process is already running and detached (Rust's std::process::Child does not kill on drop). The PID file is missing, so neither helix stop nor helix status will find the instance; the native process leaks with no CLI-level way to stop it. The child should be explicitly killed if save_pid returns an error.

Comment on lines 47 to 49
ContainerRuntime::Podman => "host.containers.internal".to_string(),
ContainerRuntime::Native => "localhost".to_string(),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dashboard container unreachable on macOS/Windows with native runtime

The comment above says "The dashboard is always a container (Docker/Podman), never native", yet "localhost" is returned for the Native arm. Inside a Docker or Podman container on macOS and Windows, localhost resolves to the container itself, not the host. So when container_runtime = "native", the HelixDB process is running directly on the host, but the dashboard container would try to reach it at localhost and fail to connect. On Linux with host networking this works, but on macOS/Windows users would see connection errors from the dashboard. Consider returning "host.docker.internal" (or the equivalent per the outer runtime responsible for the dashboard) instead, or explicitly blocking helix dashboard when running the native runtime.

Comment on lines +615 to 628
fn is_process_running(pid: u32) -> bool {
#[cfg(target_os = "linux")]
{
PathBuf::from(format!("/proc/{pid}")).exists()
}
#[cfg(not(target_os = "linux"))]
{
Command::new("kill")
.args(["-0", &pid.to_string()])
.output()
.map_err(|e| eyre!("Failed to initialize local MinIO bucket: {e}"))?;

if output.status.success() {
return Ok(());
}

last_stderr = String::from_utf8_lossy(&output.stderr).trim().to_string();
thread::sleep(Duration::from_millis(500));
.map(|o| o.status.success())
.unwrap_or(false)
}

Err(eyre!(
"Timed out initializing local MinIO bucket {LOCAL_S3_BUCKET}:\n{last_stderr}"
))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 PID reuse risk may kill an unrelated process

is_process_running and then kill -TERM {pid} operate on a numeric PID without verifying that the process belongs to this Helix instance. On Linux the /proc/{pid} directory existing only confirms that a process with that PID is alive; on other platforms kill -0 has the same limitation. On long-running machines (or after system restarts where the PID file is stale), the recycled PID could point to an unrelated daemon, which stop_process would then SIGTERM/SIGKILL. Storing an additional process attribute (e.g., start-time or a process-group ID) and comparing it before sending signals would make this safe.

- Kill child process if save_pid fails after spawn (prevents orphan)
- Send SIGKILL via PID on foreground Ctrl-C timeout (prevents runaway process)
- Store process start time in PID file and verify before sending signals
  (prevents killing a recycled PID belonging to an unrelated process)
- Add comment clarifying dashboard Native arm is dead code
@xav-db

xav-db commented Jun 2, 2026

Copy link
Copy Markdown
Member

will review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants