Skip to content

SIGTERM during startup delayed shutdown #21500

@codeedog

Description

@codeedog

Bug Description
Loki 2.9.2 as a single-node configuration ignores (delays) SIGTERM response for 40 seconds past startup. Shutdown begins once the ingester lifecycler completes its startup phase.

Running loki under a process supervisor or orchestrator with its own shutdown timeout (eg. 10-30 secs for systemd, 30 seconds default for Kubernetes) with a restart during the startup window escalates to SIGKILL. That may cause problems for the multi-node ring case (stale entries, referenced in #13262 and grafana/dskit#881).

The process exits ~40s after launch regardless of when SIGTERM arrived within that window.

To Reproduce

  1. Start loki
  2. Execute kill -TERM <PID>
  3. Wait ~40 seconds for loki to exit.
  4. Start again, wait 20 secs, kill -TERM , note it takes another 20 seconds.

Expected behavior

Loki process respects SIGTERM and exits immediately or in a timely fashion.

Environment:

  • Loki: loki, version 2.9.2 (branch: HEAD, revision: 25), prebuilt linux/amd64 binary from the v2.9.2 release.
  • OS: Debian 13.4.0, kernel 6.x, x86_64.
  • Also verified on FreeBSD 15.0 with the sysutils/alloy port of loki 2.9.2 built from the same upstream tag with go1.24.13. Behavior is identical on both platforms.

Likely root cause

This behavior is consistent with the dskit services.BasicService state machine, which permits only STARTING -> RUNNING or STARTING -> FAILED transitions. StopAsync() on a service in STARTING is deferred until the service completes STARTING.

A maintainer comment on grafana/dskit#151 confirms this is the intended design:

we need to wait until services' Starting function finishes (successfully or not). So listener would either react on running or failed transitions... (Btw, these are the only available transitions from starting state)

Requested change (preferred)

Allow STARTING -> STOPPING transitions in dskit's BasicService. SIGTERM ought to work in a reasonable time frame.

Requested change (fallback)

  1. Log a clear message when SIGTERM arrives during STARTING with expected remaining seconds until shutdown.
  2. Document max startup duration in loki/dskit.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions