Skip to content

docs(proxy/prod): clarify Gunicorn vs Uvicorn choice, worker recycling, and hitless restarts#259

Draft
yassin-berriai wants to merge 1 commit into
mainfrom
litellm_fix/LIT-3440-gunicorn-uvicorn-hitless-restart-docs
Draft

docs(proxy/prod): clarify Gunicorn vs Uvicorn choice, worker recycling, and hitless restarts#259
yassin-berriai wants to merge 1 commit into
mainfrom
litellm_fix/LIT-3440-gunicorn-uvicorn-hitless-restart-docs

Conversation

@yassin-berriai
Copy link
Copy Markdown
Contributor

Relevant issues

BerriAI/litellm#29282 (Claude Code users hitting output_config drop-params issue)
https://linear.app/litellm-ai/issue/LIT-2458 (probe recommendations)
https://linear.app/litellm-ai/issue/LIT-3307 (graceful shutdown feature)

Linear ticket

Resolves LIT-3440

Pre-Submission checklist

  • My PR's scope is as isolated as possible; it only solves 1 specific problem

Type

📖 Documentation

Changes

Expands section 3 of docs/proxy/prod.md ("On Kubernetes — Use 1 Uvicorn Worker per Pod") to cover three previously underdocumented topics:

When to use Gunicorn vs. Uvicorn. A decision table clarifies the three main scenarios: Kubernetes with HPA (1 Uvicorn worker per pod), non-Kubernetes hosts (Gunicorn with multiple workers for automatic worker respawn), and deployments that need worker recycling (Gunicorn is more stable here).

How to configure max requests before restart. Shows both the CLI flag (--max_requests_before_restart) and the env var (MAX_REQUESTS_BEFORE_RESTART), with Uvicorn and Gunicorn examples side by side. Adds a tip explaining Gunicorn's max_requests_jitter option for multi-worker hosts (prevents all workers recycling simultaneously).

Hitless restarts on Kubernetes. New subsection covering the three components that must work together for zero-downtime restarts: rolling update strategy (maxSurge: 1, maxUnavailable: 0), recommended probe settings (startup, readiness, liveness with timeout values from LIT-2458), and the preStop sleep hook with terminationGracePeriodSeconds, including an annotated shutdown sequence diagram.

Behavioral test matrix

Config Expected behavior
maxSurge: 1, maxUnavailable: 0 New pod reaches ready before old pod receives SIGTERM
preStop: sleep 15 + terminationGracePeriodSeconds: 60 15 s for endpoint slice propagation; 45 s for in-flight request drain
startupProbe failureThreshold: 120, periodSeconds: 5 Up to 10 min startup allowed before pod is killed
readinessProbe failureThreshold: 4, periodSeconds: 15 Pod removed from rotation after ~60 s of DB/cache unavailability
livenessProbe failureThreshold: 3, periodSeconds: 35 Pod killed only after ~105 s of liveness failure
--max_requests_before_restart 10000 (Gunicorn) Workers recycled one at a time via Gunicorn max_requests
--max_requests_before_restart 10000 (Uvicorn) Single worker recycled; no thundering herd risk on 1-worker-per-pod

https://claude.ai/code/session_011QYab7MJL5uDvkxEr1RJ2j


Generated by Claude Code

…g, and hitless restart config

Resolves LIT-3440

Expands section 3 of the production best practices page to cover three
previously underdocumented topics:

- Decision table for when to choose Gunicorn vs. Uvicorn (Kubernetes HPA
  vs. non-Kubernetes, worker recycling stability)
- How to configure --max_requests_before_restart (CLI and ENV) with an
  explanation of Gunicorn's max_requests_jitter for multi-worker hosts
- Full hitless-restart recipe: rolling update strategy (maxSurge/
  maxUnavailable), recommended probe settings (startup, readiness,
  liveness with timeouts drawn from LIT-2458), and preStop hook with
  terminationGracePeriodSeconds with an annotated shutdown sequence diagram

https://claude.ai/code/session_011QYab7MJL5uDvkxEr1RJ2j
@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment May 29, 2026 8:04pm

Request Review

Copy link
Copy Markdown
Contributor Author

@greptileai


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants