docs(proxy/prod): clarify Gunicorn vs Uvicorn choice, worker recycling, and hitless restarts#259
Draft
yassin-berriai wants to merge 1 commit into
Draft
Conversation
…g, and hitless restart config Resolves LIT-3440 Expands section 3 of the production best practices page to cover three previously underdocumented topics: - Decision table for when to choose Gunicorn vs. Uvicorn (Kubernetes HPA vs. non-Kubernetes, worker recycling stability) - How to configure --max_requests_before_restart (CLI and ENV) with an explanation of Gunicorn's max_requests_jitter for multi-worker hosts - Full hitless-restart recipe: rolling update strategy (maxSurge/ maxUnavailable), recommended probe settings (startup, readiness, liveness with timeouts drawn from LIT-2458), and preStop hook with terminationGracePeriodSeconds with an annotated shutdown sequence diagram https://claude.ai/code/session_011QYab7MJL5uDvkxEr1RJ2j
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Author
|
Generated by Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Relevant issues
BerriAI/litellm#29282 (Claude Code users hitting
output_configdrop-params issue)https://linear.app/litellm-ai/issue/LIT-2458 (probe recommendations)
https://linear.app/litellm-ai/issue/LIT-3307 (graceful shutdown feature)
Linear ticket
Resolves LIT-3440
Pre-Submission checklist
Type
📖 Documentation
Changes
Expands section 3 of
docs/proxy/prod.md("On Kubernetes — Use 1 Uvicorn Worker per Pod") to cover three previously underdocumented topics:When to use Gunicorn vs. Uvicorn. A decision table clarifies the three main scenarios: Kubernetes with HPA (1 Uvicorn worker per pod), non-Kubernetes hosts (Gunicorn with multiple workers for automatic worker respawn), and deployments that need worker recycling (Gunicorn is more stable here).
How to configure max requests before restart. Shows both the CLI flag (
--max_requests_before_restart) and the env var (MAX_REQUESTS_BEFORE_RESTART), with Uvicorn and Gunicorn examples side by side. Adds a tip explaining Gunicorn'smax_requests_jitteroption for multi-worker hosts (prevents all workers recycling simultaneously).Hitless restarts on Kubernetes. New subsection covering the three components that must work together for zero-downtime restarts: rolling update strategy (
maxSurge: 1,maxUnavailable: 0), recommended probe settings (startup, readiness, liveness with timeout values from LIT-2458), and thepreStopsleep hook withterminationGracePeriodSeconds, including an annotated shutdown sequence diagram.Behavioral test matrix
maxSurge: 1, maxUnavailable: 0preStop: sleep 15+terminationGracePeriodSeconds: 60startupProbe failureThreshold: 120, periodSeconds: 5readinessProbe failureThreshold: 4, periodSeconds: 15livenessProbe failureThreshold: 3, periodSeconds: 35--max_requests_before_restart 10000(Gunicorn)max_requests--max_requests_before_restart 10000(Uvicorn)https://claude.ai/code/session_011QYab7MJL5uDvkxEr1RJ2j
Generated by Claude Code