This document provides comprehensive instructions for AI agents working in the LiteLLM repository.
LiteLLM is a unified interface for 100+ LLMs that:
- Translates inputs to provider-specific completion, embedding, and image generation endpoints
- Provides consistent OpenAI-format output across all providers
- Includes retry/fallback logic across multiple deployments (Router)
- Offers a proxy server (LLM Gateway) with budgets, rate limits, and authentication
- Supports advanced features like function calling, streaming, caching, and observability
litellm/- Main library codellms/- Provider-specific implementations (OpenAI, Anthropic, Azure, etc.)proxy/- Proxy server implementation (LLM Gateway)router_utils/- Load balancing and fallback logictypes/- Type definitions and schemasintegrations/- Third-party integrations (observability, caching, etc.)
tests/- Comprehensive test suitesdocs/my-website/- Documentation websiteui/litellm-dashboard/- Admin dashboard UIenterprise/- Enterprise-specific features
-
Provider Implementations: When adding/modifying LLM providers:
- Follow existing patterns in
litellm/llms/{provider}/ - Implement proper transformation classes that inherit from
BaseConfig - Support both sync and async operations
- Handle streaming responses appropriately
- Include proper error handling with provider-specific exceptions
- Follow existing patterns in
-
Type Safety:
- Use proper type hints throughout
- Update type definitions in
litellm/types/ - Ensure compatibility with both Pydantic v1 and v2
-
Testing:
- Add tests in appropriate
tests/subdirectories - Include both unit tests and integration tests
- Test provider-specific functionality thoroughly
- Consider adding load tests for performance-critical changes
- Add tests in appropriate
-
Tremor is DEPRECATED, do not use Tremor components in new features/changes
- The only exception is the Tremor Table component and its required Tremor Table sub components.
-
Use Common Components as much as possible:
- These are usually defined in the
common_componentsdirectory - Use these components as much as possible and avoid building new components unless needed
- These are usually defined in the
-
Testing:
- The codebase uses Vitest and React Testing Library
- Query Priority Order: Use query methods in this order:
getByRole,getByLabelText,getByPlaceholderText,getByText,getByTestId - Always use
screeninstead of destructuring fromrender()(e.g., usescreen.getByText()notgetByText) - Wrap user interactions in
act(): Always wrapfireEventcalls withact()to ensure React state updates are properly handled - Use
querymethods for absence checks: UsequeryBy*methods (notgetBy*) when expecting an element to NOT be present - Test names must start with "should": All test names should follow the pattern
it("should ...") - Mock external dependencies: Check
setupTests.tsfor global mocks and mock child components/networking calls as needed - Structure tests properly:
- First test should verify the component renders successfully
- Subsequent tests should focus on functionality and user interactions
- Use
waitForfor async operations that aren't already awaited
- Avoid using
querySelector: Prefer React Testing Library queries over direct DOM manipulation
-
Function/Tool Calling:
- LiteLLM standardizes tool calling across providers
- OpenAI format is the standard, with transformations for other providers
- See
litellm/llms/anthropic/chat/transformation.pyfor complex tool handling
-
Streaming:
- All providers should support streaming where possible
- Use consistent chunk formatting across providers
- Handle both sync and async streaming
-
Error Handling:
- Use provider-specific exception classes
- Maintain consistent error formats across providers
- Include proper retry logic and fallback mechanisms
-
Configuration:
- Support both environment variables and programmatic configuration
- Use
BaseConfigclasses for provider configurations - Allow dynamic parameter passing
The proxy server is a critical component that provides:
- Authentication and authorization
- Rate limiting and budget management
- Load balancing across multiple models/deployments
- Observability and logging
- Admin dashboard UI
- Enterprise features
Key files:
litellm/proxy/proxy_server.py- Main server implementationlitellm/proxy/auth/- Authentication logiclitellm/proxy/management_endpoints/- Admin API endpoints
Database (proxy): Use Prisma model methods (prisma_client.db.<model>.upsert, .find_many, .find_unique, etc.), not raw SQL (execute_raw/query_raw). See COMMON PITFALLS for details.
LiteLLM supports MCP for agent workflows:
- MCP server integration for tool calling
- Transformation between OpenAI and MCP tool formats
- Support for external MCP servers (Zapier, Jira, Linear, etc.)
- See
litellm/experimental_mcp_client/andlitellm/proxy/_experimental/mcp_server/
Use poetry run python script.py to run Python scripts in the project environment (for non-test files).
When opening issues or pull requests, follow these templates:
- Describe what happened vs. expected behavior
- Include relevant log output
- Specify LiteLLM version
- Indicate if you're part of an ML Ops team (helps with prioritization)
- Clearly describe the feature
- Explain motivation and use case with concrete examples
- Add at least 1 test in
tests/litellm/ - Ensure
make test-unitpasses
- Provider Tests: Test against real provider APIs when possible
- Proxy Tests: Include authentication, rate limiting, and routing tests
- Performance Tests: Load testing for high-throughput scenarios
- Integration Tests: End-to-end workflows including tool calling
- Keep documentation in sync with code changes
- Update provider documentation when adding new providers
- Include code examples for new features
- Update changelog and release notes
- Handle API keys securely
- Validate all inputs, especially for proxy endpoints
- Consider rate limiting and abuse prevention
- Follow security best practices for authentication
- Some features are enterprise-only
- Check
enterprise/directory for enterprise-specific code - Maintain compatibility between open-source and enterprise versions
-
Breaking Changes: LiteLLM has many users - avoid breaking existing APIs
-
Provider Specifics: Each provider has unique quirks - handle them properly
-
Rate Limits: Respect provider rate limits in tests
-
Memory Usage: Be mindful of memory usage in streaming scenarios
-
Dependencies: Keep dependencies minimal and well-justified
-
UI/Backend Contract Mismatch: When adding a new entity type to the UI, always check whether the backend endpoint accepts a single value or an array. Match the UI control accordingly (single-select vs. multi-select) to avoid silently dropping user selections
-
Missing Tests for New Entity Types: When adding a new entity type (e.g., in
EntityUsage,UsageViewSelect), always add corresponding tests in the existing test files and update any icon/component mocks -
Raw SQL in proxy DB code: Do not use
execute_raworquery_rawfor proxy database access. Use Prisma model methods (e.g.prisma_client.db.litellm_tooltable.upsert(),.find_many(),.find_unique()) so behavior stays consistent with the schema, the client stays mockable in tests, and you avoid the pitfalls of hand-written SQL (parameter ordering, type casting, schema drift) -
Do not hardcode model-specific flags: Put model-specific capability flags in
model_prices_and_context_window.jsonand read them viaget_model_info(or existing helpers likesupports_reasoning). This prevents users from needing to upgrade LiteLLM each time a new model supports a feature.Example of BAD (hardcoded model checks):
@staticmethod def _is_effort_supported_model(model: str) -> bool: """Check if the model supports the output_config.effort parameter...""" model_lower = model.lower() if AnthropicConfig._is_claude_4_6_model(model): return True return any( v in model_lower for v in ("opus-4-5", "opus_4_5", "opus-4.5", "opus_4.5") )
Example of GOOD (config-driven or helper that reads from config):
if ( "claude-3-7-sonnet" in model or AnthropicConfig._is_claude_4_6_model(model) or supports_reasoning( model=model, custom_llm_provider=self.custom_llm_provider, ) ): ...
Using helpers like
supports_reasoning(which read frommodel_prices_and_context_window.json/get_model_info) allows future model updates to "just work" without code changes. -
Never close HTTP/SDK clients on cache eviction: Do not add
close(),aclose(), orcreate_task(close_fn())insideLLMClientCache._remove_key()or any cache eviction path. Evicted clients may still be held by in-flight requests; closing them causesRuntimeError: Cannot send a request, as the client has been closed.in production after the cache TTL (1 hour) expires. Connection cleanup is handled at shutdown byclose_litellm_async_clients(). See PR #22247 for the full incident history.
- Main documentation: https://docs.litellm.ai/
- Provider-specific docs in
docs/my-website/docs/providers/ - Admin UI for testing proxy features
- Follow existing patterns in the codebase
- Check similar provider implementations
- Ensure comprehensive test coverage
- Update documentation appropriately
- Consider backward compatibility impact
- Poetry is installed in
~/.local/bin; the update script ensures it is onPATH. - Python 3.12, Node 22 are pre-installed.
- The virtual environment lives under
~/.cache/pypoetry/virtualenvs/.
Start the proxy with a config file:
poetry run litellm --config dev_config.yaml --port 4000The proxy takes ~15-20 seconds to fully start (it runs Prisma migrations on boot). Wait for /health to return before sending requests. Without a PostgreSQL DATABASE_URL, the proxy connects to a default Neon dev database embedded in the litellm-proxy-extras package.
See CLAUDE.md and the Makefile for standard commands. Key notes:
psycopg-binarymust be installed (poetry run pip install psycopg-binary) because the pytest-postgresql plugin requires it and the lock file only includespsycopg(no binary).openapi-coremust be installed (poetry run pip install openapi-core) for the OpenAPI compliance tests intests/test_litellm/interactions/.- The
--timeoutpytest flag is NOT available; don't pass it. - Unit tests:
poetry run pytest tests/test_litellm/ -x -vv -n 4 - Black
--checkmay report pre-existing formatting issues; this does not block test runs. - If
poetry installfails with "pyproject.toml changed significantly since poetry.lock was last generated", runpoetry lockfirst to regenerate the lock file.
cd litellm && poetry run ruff check .Ruff is the primary fast linter. For the full lint suite (including mypy, black, circular imports), run make lint per CLAUDE.md.
- The UI is at
ui/litellm-dashboard/. Runnpm run devfrom that directory for the Next.js dev server on port 3000. - The proxy at port 4000 serves a pre-built static UI from
litellm/proxy/_experimental/out/. After making UI code changes, you must runnpm run buildin the dashboard directory and copy the output:cp -r ui/litellm-dashboard/out/* litellm/proxy/_experimental/out/for the proxy to serve the updated UI. - SVGs used as provider logos (loaded via
<img>tags) must NOT usefill="currentColor"— replace with an explicit color like#000000or use the-colorvariant from lobehub icons, since CSS color inheritance does not work inside<img>elements. - Provider logos live in
ui/litellm-dashboard/public/assets/logos/(source) andlitellm/proxy/_experimental/out/assets/logos/(pre-built). Both locations must have the file for it to work in dev and proxy-served modes. - UI Vitest tests:
cd ui/litellm-dashboard && npx vitest run