Add idle container support (scale-to-zero) by martijnenco · Pull Request #197 · basecamp/kamal-proxy

martijnenco · 2026-03-12T09:36:48Z

Add automatic idle container management to kamal-proxy: containers are stopped after a configurable period of inactivity and transparently restarted when new requests arrive.

What changed
New deploy command flags:

--idle-timeout -- duration of inactivity before the container is stopped (e.g. 300s). Defaults to 0 (disabled).
--idle-wake-timeout -- maximum time to hold an incoming request while the container is starting back up. Defaults to 30s.

New components:

DockerClient (docker_client.go) -- lightweight HTTP client that communicates with the Docker daemon over a Unix socket (/var/run/docker.sock) to stop and start containers. No external Docker SDK dependency.
IdleController (idle_controller.go) -- per-service state machine (Active -> Sleeping -> Waking -> Active) that tracks request activity, puts containers to sleep after the idle timeout, and wakes them on demand.

Integration into existing code:

Service now hosts an IdleController alongside the existing PauseController. Every proxied request updates the idle timer. Health check requests (/up) do not count as activity and do not trigger a wake.
Router.ListActiveServices reports idle state (sleeping, waking) so operators can see which services are asleep via kamal-proxy list.
The run command accepts a --docker-socket flag (defaulting to /var/run/docker.sock) to configure the socket path.
State is persisted in the existing state file so idle services survive proxy restarts.

Why
Review apps and staging environments often sit idle for hours or days, consuming memory for nothing. This feature lets kamal-proxy automatically reclaim those resources and bring containers back in seconds when someone visits the app -- similar to how Fly.io Machines, Railway, and Render handle scale-to-zero. The implementation keeps the proxy as the single point of control (no sidecar containers) and reuses the existing health check infrastructure for wake-up verification.

Dependent to this PR for kamal: basecamp/kamal#1800

Copilot

Pull request overview

Adds scale-to-zero (“idle container”) support to kamal-proxy by introducing a Docker socket client and an idle state machine that can stop containers after inactivity and wake them on demand, integrating the new behavior into service request handling, router reporting, CLI flags, and persisted state.

Changes:

Introduce DockerClient (Unix-socket HTTP client to Docker) and IdleController (Active/Sleeping/Waking state machine).
Integrate idle tracking/wake behavior into Service and expose idle state via Router.ListActiveServices.
Add CLI/config plumbing for Docker socket path and idle deploy flags; add unit tests for Docker/idle components.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
internal/server/testing.go	Update test server helper to pass Docker socket path into `NewRouter`.
internal/server/service_test.go	Update service creation helper for new `NewService(..., docker)` signature.
internal/server/service.go	Add idle options + integrate `IdleController` into request path, lifecycle, and JSON persistence.
internal/server/router_test.go	Update router tests for new `NewRouter(statePath, dockerSocketPath)` signature.
internal/server/router.go	Add router-level `DockerClient`, wire it into services and restored state, and expose idle state in listing.
internal/server/idle_controller_test.go	New tests covering idle->sleep and wake flow via a fake Docker API over a Unix socket.
internal/server/idle_controller.go	New idle state machine that stops/starts containers and waits for health on wake.
internal/server/docker_client_test.go	New tests validating Docker stop/start calls over Unix socket.
internal/server/docker_client.go	New minimal Docker HTTP client over a Unix socket.
internal/server/config.go	Add default Docker socket path constant and config field.
internal/cmd/run.go	Add `--docker-socket` flag and pass it into `NewRouter`.
internal/cmd/deploy.go	Add `--idle-timeout` / `--idle-wake-timeout` deploy flags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-12T09:44:40Z

internal/server/service.go

+	IdleTimeout                 time.Duration `json:"idle_timeout"`
+	IdleWakeTimeout             time.Duration `json:"idle_wake_timeout"`


The new idle fields have no defaults/validation applied in ServiceOptions.Normalize(). If IdleTimeout > 0 but IdleWakeTimeout <= 0 (e.g., from older persisted state or a caller that doesn’t set the flag), wake requests can time out immediately. Consider normalizing IdleWakeTimeout to DefaultIdleWakeTimeout when idle is enabled (and rejecting negative durations).

Copilot · 2026-03-12T09:44:40Z

internal/server/service.go

+			s.idleController.docker = s.docker
+			s.idleController.lb = s.active
+			s.idleController.serviceName = s.name
+			s.idleController.IdleTimeout = s.options.IdleTimeout
+			s.idleController.WakeTimeout = s.options.IdleWakeTimeout


initialize() updates IdleController internals (docker, lb, serviceName, timeouts) via direct assignment without taking the idle controller’s lock. Since the idle controller goroutine/request path can read these concurrently, this introduces data races. Prefer an IdleController.UpdateConfig(...) method that holds the lock and updates all mutable fields atomically.

Copilot · 2026-03-12T09:44:40Z

internal/server/idle_controller_test.go

+package server
+
+import (
+	"context"


The context import is unused in this test file, which will cause a compile failure. Remove the import (or use it if intended).

Suggested change

"context"

Copilot · 2026-03-12T09:44:41Z

internal/server/idle_controller.go

+	ticker := time.NewTicker(10 * time.Second)
+	defer ticker.Stop()
+
+	for {
+		select {
+		case <-ic.closeChan:
+			return
+		case <-ticker.C:


run() uses a fixed time.NewTicker(10 * time.Second), so IdleTimeout values shorter than 10s can’t be honored (and longer ones can overshoot by up to the tick interval). Consider using a resettable time.Timer based on lastRequestAt/IdleTimeout, or deriving the tick interval from IdleTimeout with a sane min/max.

Suggested change

ticker := time.NewTicker(10 * time.Second)

defer ticker.Stop()

for {

select {

case <-ic.closeChan:

return

case <-ticker.C:

for {

ic.lock.Lock()

disabled := ic.disabled

idleTimeout := ic.IdleTimeout

lastRequestAt := ic.lastRequestAt

ic.lock.Unlock()

// If disabled or no idle timeout is configured, poll infrequently.

if disabled || idleTimeout <= 0 {

select {

case <-ic.closeChan:

return

case <-time.After(time.Second):

continue

}

}

// Compute remaining time until we should consider the service idle.

remaining := idleTimeout - time.Since(lastRequestAt)

// If we've already exceeded the idle timeout, check immediately.

if remaining <= 0 {

select {

case <-ic.closeChan:

return

default:

}

ic.checkIdle()

continue

}

timer := time.NewTimer(remaining)

select {

case <-ic.closeChan:

if !timer.Stop() {

<-timer.C

}

return

case <-timer.C:

Copilot · 2026-03-12T09:44:41Z

internal/server/idle_controller.go

+
+	if ic.State != IdleStateActive {
+		ic.State = IdleStateActive
+		if ic.wakeChan != nil {
+			close(ic.wakeChan)
+			ic.wakeChan = nil
+		}


UpdateContainers closes ic.wakeChan when moving back to active. The wake goroutine also closes ic.wakeChan, so this can cause a double-close panic under concurrent wake + deploy/update. Consider ensuring only one code path ever closes the channel (or guard channel ownership so it’s closed exactly once).

Suggested change

if ic.State != IdleStateActive {

ic.State = IdleStateActive

if ic.wakeChan != nil {

close(ic.wakeChan)

ic.wakeChan = nil

}

if ic.State != IdleStateActive {

ic.State = IdleStateActive

Copilot · 2026-03-12T09:44:41Z

internal/server/idle_controller.go

+		close(ic.wakeChan)
+		ic.wakeChan = nil
+		ic.lock.Unlock()


The wake goroutine unconditionally does close(ic.wakeChan). If wakeChan is nil (e.g., restored sleeping state) or already closed/cleared by another path (e.g., UpdateContainers), this will panic. Capture/clear the channel under lock and close it only if non-nil and still owned by this wake operation.

Suggested change

close(ic.wakeChan)

ic.wakeChan = nil

ic.lock.Unlock()

wakeChan := ic.wakeChan

ic.wakeChan = nil

ic.lock.Unlock()

if wakeChan != nil {

close(wakeChan)

}

Copilot · 2026-03-12T09:44:41Z

internal/server/router.go

 	r.withWriteLock(func() error {
 		r.services = NewServiceMap()


RestoreLastSavedState calls r.withWriteLock(...) without checking/returning its error. If the callback fails, the error will be dropped and the method will still return nil. Capture the returned error and propagate it.

Copilot · 2026-03-12T09:44:42Z

internal/server/service.go

 		s.active = lb
+		if s.idleController != nil {
+			s.idleController.lb = lb
+			s.idleController.UpdateContainers(lb.WriteTargets().Names())
+		}


UpdateLoadBalancer mutates idleController.lb directly without taking the idle controller’s lock. The idle controller reads lb concurrently during wake (ic.lb.WaitUntilHealthy(...)), so this is a data race under concurrent deploy + traffic (and can trip -race). Add a setter on IdleController that updates lb under its lock, or update it inside IdleController.UpdateContainers(...) while holding the lock.

Copilot · 2026-03-12T09:44:42Z

internal/server/router.go

 		for _, service := range services {
+			service.docker = r.dockerClient
+			service.initialize(service.options, service.targetOptions)
 			r.services.Set(service)


RestoreLastSavedState calls service.initialize(...) but ignores the returned error. Since initialize returns an error, this won’t compile, and even if it did, failed initialization (e.g., cert/middleware errors) would be silently swallowed. Capture the error and return it from the withWriteLock callback so state restoration fails loudly when a service can’t be re-initialized.

Copilot · 2026-03-12T09:44:42Z

internal/server/idle_controller.go

+
+	ic.lastRequestAt = time.Now()
+	ic.closeChan = make(chan bool)
+


UnmarshalJSON restarts the controller goroutine but doesn’t initialize wakeChan when restoring into a sleeping/waking state. That can lead to wake() eventually closing a nil channel (panic) and/or WaitIfSleeping() not actually waiting. Initialize wakeChan appropriately on restore (or normalize restored state back to active).

Suggested change

switch ic.State {

case IdleStateSleeping, IdleStateWaking:

ic.wakeChan = make(chan bool)

default:

// For IdleStateActive, wakeChan can remain nil until needed.

}

Add support for review apps or scale-to-zero configurations

bfb01a0

Copilot AI review requested due to automatic review settings March 12, 2026 09:36

Copilot started reviewing on behalf of martijnenco March 12, 2026 09:37 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

martijnenco mentioned this pull request Mar 12, 2026

Add idle container configuration for kamal-proxy basecamp/kamal#1800

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add idle container support (scale-to-zero)#197

Add idle container support (scale-to-zero)#197
martijnenco wants to merge 1 commit intobasecamp:mainfrom
martijnenco:feature/add-support-for-review-apps

martijnenco commented Mar 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		IdleTimeout time.Duration `json:"idle_timeout"`
		IdleWakeTimeout time.Duration `json:"idle_wake_timeout"`

-	ticker := time.NewTicker(10 * time.Second)
-	defer ticker.Stop()
-	for {
-		select {
-		case <-ic.closeChan:
-			return
-		case <-ticker.C:
+	for {
+		ic.lock.Lock()
+		disabled := ic.disabled
+		idleTimeout := ic.IdleTimeout
+		lastRequestAt := ic.lastRequestAt
+		ic.lock.Unlock()
+		// If disabled or no idle timeout is configured, poll infrequently.
+		if disabled || idleTimeout <= 0 {
+			select {
+			case <-ic.closeChan:
+				return
+			case <-time.After(time.Second):
+				continue
+			}
+		}
+		// Compute remaining time until we should consider the service idle.
+		remaining := idleTimeout - time.Since(lastRequestAt)
+		// If we've already exceeded the idle timeout, check immediately.
+		if remaining <= 0 {
+			select {
+			case <-ic.closeChan:
+				return
+			default:
+			}
+			ic.checkIdle()
+			continue
+		}
+		timer := time.NewTimer(remaining)
+		select {
+		case <-ic.closeChan:
+			if !timer.Stop() {
+				<-timer.C
+			}
+			return
+		case <-timer.C:

+	switch ic.State {
+	case IdleStateSleeping, IdleStateWaking:
+		ic.wakeChan = make(chan bool)
+	default:
+		// For IdleStateActive, wakeChan can remain nil until needed.
+	}

Conversation

martijnenco commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

martijnenco commented Mar 12, 2026 •

edited

Loading