docs(proxy/prod): clarify Gunicorn vs Uvicorn choice, worker recycling, and hitless restarts by yassin-berriai · Pull Request #259 · BerriAI/litellm-docs

yassin-berriai · 2026-05-29T20:02:52Z

Relevant issues

BerriAI/litellm#29282 (Claude Code users hitting output_config drop-params issue)
https://linear.app/litellm-ai/issue/LIT-2458 (probe recommendations)
https://linear.app/litellm-ai/issue/LIT-3307 (graceful shutdown feature)

Linear ticket

Resolves LIT-3440

Pre-Submission checklist

My PR's scope is as isolated as possible; it only solves 1 specific problem

Type

📖 Documentation

Changes

Expands section 3 of docs/proxy/prod.md ("On Kubernetes — Use 1 Uvicorn Worker per Pod") to cover three previously underdocumented topics:

When to use Gunicorn vs. Uvicorn. A decision table clarifies the three main scenarios: Kubernetes with HPA (1 Uvicorn worker per pod), non-Kubernetes hosts (Gunicorn with multiple workers for automatic worker respawn), and deployments that need worker recycling (Gunicorn is more stable here).

How to configure max requests before restart. Shows both the CLI flag (--max_requests_before_restart) and the env var (MAX_REQUESTS_BEFORE_RESTART), with Uvicorn and Gunicorn examples side by side. Adds a tip explaining Gunicorn's max_requests_jitter option for multi-worker hosts (prevents all workers recycling simultaneously).

Hitless restarts on Kubernetes. New subsection covering the three components that must work together for zero-downtime restarts: rolling update strategy (maxSurge: 1, maxUnavailable: 0), recommended probe settings (startup, readiness, liveness with timeout values from LIT-2458), and the preStop sleep hook with terminationGracePeriodSeconds, including an annotated shutdown sequence diagram.

Behavioral test matrix

Config	Expected behavior
`maxSurge: 1, maxUnavailable: 0`	New pod reaches ready before old pod receives SIGTERM
`preStop: sleep 15` + `terminationGracePeriodSeconds: 60`	15 s for endpoint slice propagation; 45 s for in-flight request drain
`startupProbe failureThreshold: 120, periodSeconds: 5`	Up to 10 min startup allowed before pod is killed
`readinessProbe failureThreshold: 4, periodSeconds: 15`	Pod removed from rotation after ~60 s of DB/cache unavailability
`livenessProbe failureThreshold: 3, periodSeconds: 35`	Pod killed only after ~105 s of liveness failure
`--max_requests_before_restart 10000` (Gunicorn)	Workers recycled one at a time via Gunicorn `max_requests`
`--max_requests_before_restart 10000` (Uvicorn)	Single worker recycled; no thundering herd risk on 1-worker-per-pod

https://claude.ai/code/session_011QYab7MJL5uDvkxEr1RJ2j

Generated by Claude Code

…g, and hitless restart config Resolves LIT-3440 Expands section 3 of the production best practices page to cover three previously underdocumented topics: - Decision table for when to choose Gunicorn vs. Uvicorn (Kubernetes HPA vs. non-Kubernetes, worker recycling stability) - How to configure --max_requests_before_restart (CLI and ENV) with an explanation of Gunicorn's max_requests_jitter for multi-worker hosts - Full hitless-restart recipe: rolling update strategy (maxSurge/ maxUnavailable), recommended probe settings (startup, readiness, liveness with timeouts drawn from LIT-2458), and preStop hook with terminationGracePeriodSeconds with an annotated shutdown sequence diagram https://claude.ai/code/session_011QYab7MJL5uDvkxEr1RJ2j

vercel · 2026-05-29T20:02:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	May 29, 2026 8:04pm

yassin-berriai · 2026-05-29T20:02:59Z

@greptileai

Generated by Claude Code

vercel Bot deployed to Preview May 29, 2026 20:04 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(proxy/prod): clarify Gunicorn vs Uvicorn choice, worker recycling, and hitless restarts#259

docs(proxy/prod): clarify Gunicorn vs Uvicorn choice, worker recycling, and hitless restarts#259
yassin-berriai wants to merge 1 commit into
mainfrom
litellm_fix/LIT-3440-gunicorn-uvicorn-hitless-restart-docs

yassin-berriai commented May 29, 2026

Uh oh!

vercel Bot commented May 29, 2026 •

edited

Loading

Uh oh!

yassin-berriai commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yassin-berriai commented May 29, 2026

Relevant issues

Linear ticket

Pre-Submission checklist

Type

Changes

Behavioral test matrix

Uh oh!

vercel Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yassin-berriai commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 29, 2026 •

edited

Loading