Working Group Proposal - Graceful Shutdown #53206
Replies: 6 comments 8 replies
-
|
@ozangunalp @dmlloyd @mkouba could you have a look to the proposal? |
Beta Was this translation helpful? Give feedback.
-
|
Would we expect that in most cases the optimum shutdown order would be the reverse of the optimum startup order? I think we don't have any mechanism for chaining startup events, only build-time events (through build items), but I wonder whether some harmonisation across the three might make sense? That could give sensible defaults (maybe), and also a coherent mental model. For example, built-order dependencies are done through types, and the proposed shutdown order is done through strings. And unless I'm forgetting something, the startup order doesn't exist. Well, except for an ad hoc API I put in for dev services. |
Beta Was this translation helpful? Give feedback.
-
|
The word quiescence reminds me of previous discussions... While it describes well the state we want to put the application in, I always think of it as a two-way road, where it is possible to come back up from a quiescent state. Here, we want to talk more about a "pre-shutdown" phase here, where the only option is to shut down the application. We can keep the word in here in the description, but during implementation and docs, I think pre-shutdown is easier to understand. Also, here are even more existing issues : #38119, #41233, #52389, #51327, #50475, #44894 |
Beta Was this translation helpful? Give feedback.
-
If I understand it corretly, all of this can be currently implemented with
Could you be more specific? |
Beta Was this translation helpful? Give feedback.
-
|
Working group officially started: WG - Graceful Shutdown (view) |
Beta Was this translation helpful? Give feedback.
-
|
Hey folks, something I spotted while doing work on At the end of default Future<Void> shutdown() {
return shutdown(30, TimeUnit.SECONDS);
} |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Objective
Make Quarkus shutdown truly graceful by providing two complementary capabilities: (1) a mechanism for extensions to block incoming traffic and drain in-flight work to reach a quiescent state, integrated with readiness health checks, and (2) a well-ordered, phased shutdown API that gives extension developers clear control over when and how their shutdown logic executes.
The Problem
No Quiescence Mechanism
When a Quarkus application receives a shutdown signal, there is no standard way for extensions to transition gracefully from "serving" to "draining" to "stopped." The core issue is the lack of a quiescence protocol, a coordinated mechanism where:
Today, each extension that wants to participate in graceful shutdown must implement this pattern ad hoc. Most don't. The result is interrupted HTTP requests, half-processed messages, aborted scheduled jobs, and data loss during rolling deployments.
The readiness health check (
/q/health/ready, SmallRye Health, Kubernetes readiness probes) is the natural signal to orchestrators that an instance should stop receiving traffic, but the shutdown sequence does not integrate with it in a first-class way. Extensions cannot easily declare "I am draining, mark me as not ready" and have that reflected in the health check automatically.Confusing and Under-Documented Shutdown APIs
Quarkus currently has three separate shutdown API layers with unclear boundaries:
Primordial API (
ShutdownContext):addShutdownTask(Runnable)(reverse order) andaddLastShutdownTask(Runnable), used viaShutdownContextBuildItemat build time. TheaddLastShutdownTask()name is universally disliked (at least we reached a consensus).Graceful Shutdown API (
ShutdownListener):preShutdown(ShutdownNotification)andshutdown(ShutdownNotification)withdone()callback, used viaShutdownListenerBuildItem. Also exposes@ShutdownDelayInitiatedwhich maps topreShutdown.CDI layer:
@Shutdown(firesShutdownEventbefore ArC shutdown),@BeforeDestroyed(ApplicationScoped.class)(fired when ArC shuts down),@PreDestroyon beans/interceptors.There is no documentation explaining when to use which API. The actual shutdown sequence is only discoverable by reading the source code:
Graceful shutdown phase:
ShutdownListener#preShutdown()→ delay wait (ifquarkus.shutdown.delay-enabled=true, duration =quarkus.shutdown.delay) →ShutdownListener#shutdown()→ timeout waitShutdown tasks phase:
ShutdownEventfired → ArC shutdown (@BeforeDestroyed(ApplicationScoped.class),@PreDestroy)Last shutdown tasks phase: Tasks registered via
addLastShutdownTask()No Cross-Extension Ordering
There is no way to express that the shutdown logic from extension A should execute before or after the logic from extension B. Extensions shut down in an undefined order, which causes real issues; for example, a scheduler extension may try to complete a job that requires a datasource that has already been closed.
No Debug/Tracing Tools
There is no way to log, trace, or inspect the shutdown sequence at runtime. When shutdown hangs or misbehaves, developers have no tooling to identify which task is blocking or how tasks are ordered.
Limited Extension Coverage
Many extensions do not participate in graceful shutdown at all. For example,
quarkus-schedulerdoes not drain running or scheduled jobs before shutdown, which can cause work to be interrupted.The Proposed Solution
The work is organized around two complementary parts.
Part 1: Pre-Shutdown Phase (Traffic Draining)
Design and implement a standard protocol for extensions to participate in graceful traffic draining, allowing the application to reach a quiescent state before any ordering-sensitive teardown begins. This phase is independent of the dependency-ordered teardown in Part 2 — it runs first and is about stopping new work, not tearing down resources.
Traffic blocking and readiness integration:
/q/health/ready) immediately reportsDOWN, signaling to Kubernetes / load balancers to stop routing new trafficExtension participation model:
ShutdownListener#preShutdown()with adone()callbackKey integrations:
Part 2: Dependency-Ordered Shutdown (and Startup) API
The correct shutdown order is the reverse of the correct startup order. Today, Quarkus has no explicit startup ordering API (only build-time ordering via build items), and the shutdown ordering is ad hoc. This part of the work normalizes both.
Dependency-oriented ordering:
Clear API guidance:
Concurrent shutdown investigation:
Shutdown tracing and debugging:
API naming improvements:
addLastShutdownTask()) within the Quarkus 4 breaking-change windowDefinition of Done
Part 1: Pre-Shutdown
DOWNwhen shutdown is initiatedquarkus-scheduler, gRPC, and WebSocketPart 2: Dependency-Ordered Shutdown
Scope of Work
In Scope
Out of Scope
ShutdownContextandShutdownListenerAPIsOrganizing the Work
Communication
#wg-graceful-shutdownTimeline
Existing Issues
Beta Was this translation helpful? Give feedback.
All reactions