Bug Description
Loki 2.9.2 as a single-node configuration ignores (delays) SIGTERM response for 40 seconds past startup. Shutdown begins once the ingester lifecycler completes its startup phase.
Running loki under a process supervisor or orchestrator with its own shutdown timeout (eg. 10-30 secs for systemd, 30 seconds default for Kubernetes) with a restart during the startup window escalates to SIGKILL. That may cause problems for the multi-node ring case (stale entries, referenced in #13262 and grafana/dskit#881).
The process exits ~40s after launch regardless of when SIGTERM arrived within that window.
To Reproduce
- Start loki
- Execute
kill -TERM <PID>
- Wait ~40 seconds for loki to exit.
- Start again, wait 20 secs, kill -TERM , note it takes another 20 seconds.
Expected behavior
Loki process respects SIGTERM and exits immediately or in a timely fashion.
Environment:
- Loki:
loki, version 2.9.2 (branch: HEAD, revision: 25), prebuilt linux/amd64 binary from the v2.9.2 release.
- OS: Debian 13.4.0, kernel 6.x, x86_64.
- Also verified on FreeBSD 15.0 with the sysutils/alloy port of loki 2.9.2 built from the same upstream tag with
go1.24.13. Behavior is identical on both platforms.
Likely root cause
This behavior is consistent with the dskit services.BasicService state machine, which permits only STARTING -> RUNNING or STARTING -> FAILED transitions. StopAsync() on a service in STARTING is deferred until the service completes STARTING.
A maintainer comment on grafana/dskit#151 confirms this is the intended design:
we need to wait until services' Starting function finishes (successfully or not). So listener would either react on running or failed transitions... (Btw, these are the only available transitions from starting state)
Requested change (preferred)
Allow STARTING -> STOPPING transitions in dskit's BasicService. SIGTERM ought to work in a reasonable time frame.
Requested change (fallback)
- Log a clear message when SIGTERM arrives during
STARTING with expected remaining seconds until shutdown.
- Document max startup duration in loki/dskit.
Related issues
Bug Description
Loki 2.9.2 as a single-node configuration ignores (delays) SIGTERM response for 40 seconds past startup. Shutdown begins once the ingester lifecycler completes its startup phase.
Running loki under a process supervisor or orchestrator with its own shutdown timeout (eg. 10-30 secs for systemd, 30 seconds default for Kubernetes) with a restart during the startup window escalates to SIGKILL. That may cause problems for the multi-node ring case (stale entries, referenced in #13262 and grafana/dskit#881).
The process exits ~40s after launch regardless of when SIGTERM arrived within that window.
To Reproduce
kill -TERM <PID>Expected behavior
Loki process respects SIGTERM and exits immediately or in a timely fashion.
Environment:
loki, version 2.9.2 (branch: HEAD, revision: 25), prebuilt linux/amd64 binary from the v2.9.2 release.go1.24.13. Behavior is identical on both platforms.Likely root cause
This behavior is consistent with the dskit
services.BasicServicestate machine, which permits onlySTARTING -> RUNNINGorSTARTING -> FAILEDtransitions.StopAsync()on a service inSTARTINGis deferred until the service completesSTARTING.A maintainer comment on grafana/dskit#151 confirms this is the intended design:
Requested change (preferred)
Allow
STARTING -> STOPPINGtransitions in dskit'sBasicService. SIGTERM ought to work in a reasonable time frame.Requested change (fallback)
STARTINGwith expected remaining seconds until shutdown.Related issues
moduleServicewrapper dskit#151 -- state-machine design confirmation (maintainer comment quoted above)invalid service state: Stopping, expected: Running)