INSTRUCTIONS FOR LITELLM

This document provides comprehensive instructions for AI agents working in the LiteLLM repository.

OVERVIEW

LiteLLM is a unified interface for 100+ LLMs that:

Translates inputs to provider-specific completion, embedding, and image generation endpoints
Provides consistent OpenAI-format output across all providers
Includes retry/fallback logic across multiple deployments (Router)
Offers a proxy server (LLM Gateway) with budgets, rate limits, and authentication
Supports advanced features like function calling, streaming, caching, and observability

REPOSITORY STRUCTURE

Core Components

litellm/ - Main library code
- llms/ - Provider-specific implementations (OpenAI, Anthropic, Azure, etc.)
- proxy/ - Proxy server implementation (LLM Gateway)
- router_utils/ - Load balancing and fallback logic
- types/ - Type definitions and schemas
- integrations/ - Third-party integrations (observability, caching, etc.)

Key Directories

tests/ - Comprehensive test suites
docs/my-website/ - Documentation website
ui/litellm-dashboard/ - Admin dashboard UI
enterprise/ - Enterprise-specific features

DEVELOPMENT GUIDELINES

MAKING CODE CHANGES

Provider Implementations: When adding/modifying LLM providers:
- Follow existing patterns in litellm/llms/{provider}/
- Implement proper transformation classes that inherit from BaseConfig
- Support both sync and async operations
- Handle streaming responses appropriately
- Include proper error handling with provider-specific exceptions
Type Safety:
- Use proper type hints throughout
- Update type definitions in litellm/types/
- Ensure compatibility with both Pydantic v1 and v2
Testing:
- Add tests in appropriate tests/ subdirectories
- Include both unit tests and integration tests
- Test provider-specific functionality thoroughly
- Consider adding load tests for performance-critical changes

MAKING CODE CHANGES FOR THE UI (IGNORE FOR BACKEND)

Tremor is DEPRECATED, do not use Tremor components in new features/changes
- The only exception is the Tremor Table component and its required Tremor Table sub components.
Use Common Components as much as possible:
- These are usually defined in the common_components directory
- Use these components as much as possible and avoid building new components unless needed
Testing:
- The codebase uses Vitest and React Testing Library
- Query Priority Order: Use query methods in this order: getByRole, getByLabelText, getByPlaceholderText, getByText, getByTestId
- Always use screen instead of destructuring from render() (e.g., use screen.getByText() not getByText)
- Wrap user interactions in act(): Always wrap fireEvent calls with act() to ensure React state updates are properly handled
- Use query methods for absence checks: Use queryBy* methods (not getBy*) when expecting an element to NOT be present
- Test names must start with "should": All test names should follow the pattern it("should ...")
- Mock external dependencies: Check setupTests.ts for global mocks and mock child components/networking calls as needed
- Structure tests properly:
  - First test should verify the component renders successfully
  - Subsequent tests should focus on functionality and user interactions
  - Use waitFor for async operations that aren't already awaited
- Avoid using querySelector: Prefer React Testing Library queries over direct DOM manipulation

IMPORTANT PATTERNS

Function/Tool Calling:
- LiteLLM standardizes tool calling across providers
- OpenAI format is the standard, with transformations for other providers
- See litellm/llms/anthropic/chat/transformation.py for complex tool handling
Streaming:
- All providers should support streaming where possible
- Use consistent chunk formatting across providers
- Handle both sync and async streaming
Error Handling:
- Use provider-specific exception classes
- Maintain consistent error formats across providers
- Include proper retry logic and fallback mechanisms
Configuration:
- Support both environment variables and programmatic configuration
- Use BaseConfig classes for provider configurations
- Allow dynamic parameter passing

PROXY SERVER (LLM GATEWAY)

The proxy server is a critical component that provides:

Authentication and authorization
Rate limiting and budget management
Load balancing across multiple models/deployments
Observability and logging
Admin dashboard UI
Enterprise features

Key files:

litellm/proxy/proxy_server.py - Main server implementation
litellm/proxy/auth/ - Authentication logic
litellm/proxy/management_endpoints/ - Admin API endpoints

Database (proxy): Use Prisma model methods (prisma_client.db.<model>.upsert, .find_many, .find_unique, etc.), not raw SQL (execute_raw/query_raw). See COMMON PITFALLS for details.

MCP (MODEL CONTEXT PROTOCOL) SUPPORT

LiteLLM supports MCP for agent workflows:

MCP server integration for tool calling
Transformation between OpenAI and MCP tool formats
Support for external MCP servers (Zapier, Jira, Linear, etc.)
See litellm/experimental_mcp_client/ and litellm/proxy/_experimental/mcp_server/

RUNNING SCRIPTS

Use poetry run python script.py to run Python scripts in the project environment (for non-test files).

GITHUB TEMPLATES

When opening issues or pull requests, follow these templates:

Bug Reports (`.github/ISSUE_TEMPLATE/bug_report.yml`)

Describe what happened vs. expected behavior
Include relevant log output
Specify LiteLLM version
Indicate if you're part of an ML Ops team (helps with prioritization)

Feature Requests (`.github/ISSUE_TEMPLATE/feature_request.yml`)

Clearly describe the feature
Explain motivation and use case with concrete examples

Pull Requests (`.github/pull_request_template.md`)

Add at least 1 test in tests/litellm/
Ensure make test-unit passes

TESTING CONSIDERATIONS

Provider Tests: Test against real provider APIs when possible
Proxy Tests: Include authentication, rate limiting, and routing tests
Performance Tests: Load testing for high-throughput scenarios
Integration Tests: End-to-end workflows including tool calling

DOCUMENTATION

Keep documentation in sync with code changes
Update provider documentation when adding new providers
Include code examples for new features
Update changelog and release notes

SECURITY CONSIDERATIONS

Handle API keys securely
Validate all inputs, especially for proxy endpoints
Consider rate limiting and abuse prevention
Follow security best practices for authentication

ENTERPRISE FEATURES

Some features are enterprise-only
Check enterprise/ directory for enterprise-specific code
Maintain compatibility between open-source and enterprise versions

COMMON PITFALLS TO AVOID

Breaking Changes: LiteLLM has many users - avoid breaking existing APIs
Provider Specifics: Each provider has unique quirks - handle them properly
Rate Limits: Respect provider rate limits in tests
Memory Usage: Be mindful of memory usage in streaming scenarios
Dependencies: Keep dependencies minimal and well-justified
UI/Backend Contract Mismatch: When adding a new entity type to the UI, always check whether the backend endpoint accepts a single value or an array. Match the UI control accordingly (single-select vs. multi-select) to avoid silently dropping user selections
Missing Tests for New Entity Types: When adding a new entity type (e.g., in EntityUsage, UsageViewSelect), always add corresponding tests in the existing test files and update any icon/component mocks
Raw SQL in proxy DB code: Do not use execute_raw or query_raw for proxy database access. Use Prisma model methods (e.g. prisma_client.db.litellm_tooltable.upsert(), .find_many(), .find_unique()) so behavior stays consistent with the schema, the client stays mockable in tests, and you avoid the pitfalls of hand-written SQL (parameter ordering, type casting, schema drift)

Do not hardcode model-specific flags: Put model-specific capability flags in model_prices_and_context_window.json and read them via get_model_info (or existing helpers like supports_reasoning). This prevents users from needing to upgrade LiteLLM each time a new model supports a feature.

Example of BAD (hardcoded model checks):

@staticmethod
def _is_effort_supported_model(model: str) -> bool:
    """Check if the model supports the output_config.effort parameter..."""
    model_lower = model.lower()
    if AnthropicConfig._is_claude_4_6_model(model):
        return True
    return any(
        v in model_lower for v in ("opus-4-5", "opus_4_5", "opus-4.5", "opus_4.5")
    )

Example of GOOD (config-driven or helper that reads from config):

if (
    "claude-3-7-sonnet" in model
    or AnthropicConfig._is_claude_4_6_model(model)
    or supports_reasoning(
        model=model,
        custom_llm_provider=self.custom_llm_provider,
    )
):
    ...

Using helpers like supports_reasoning (which read from model_prices_and_context_window.json / get_model_info) allows future model updates to "just work" without code changes.

Never close HTTP/SDK clients on cache eviction: Do not add close(), aclose(), or create_task(close_fn()) inside LLMClientCache._remove_key() or any cache eviction path. Evicted clients may still be held by in-flight requests; closing them causes RuntimeError: Cannot send a request, as the client has been closed. in production after the cache TTL (1 hour) expires. Connection cleanup is handled at shutdown by close_litellm_async_clients(). See PR #22247 for the full incident history.

HELPFUL RESOURCES

Main documentation: https://docs.litellm.ai/
Provider-specific docs in docs/my-website/docs/providers/
Admin UI for testing proxy features

WHEN IN DOUBT

Follow existing patterns in the codebase
Check similar provider implementations
Ensure comprehensive test coverage
Update documentation appropriately
Consider backward compatibility impact

Cursor Cloud specific instructions

Environment

Poetry is installed in ~/.local/bin; the update script ensures it is on PATH.
Python 3.12, Node 22 are pre-installed.
The virtual environment lives under ~/.cache/pypoetry/virtualenvs/.

Running the proxy server

Start the proxy with a config file:

poetry run litellm --config dev_config.yaml --port 4000

The proxy takes ~15-20 seconds to fully start (it runs Prisma migrations on boot). Wait for /health to return before sending requests. Without a PostgreSQL DATABASE_URL, the proxy connects to a default Neon dev database embedded in the litellm-proxy-extras package.

Running tests

See CLAUDE.md and the Makefile for standard commands. Key notes:

psycopg-binary must be installed (poetry run pip install psycopg-binary) because the pytest-postgresql plugin requires it and the lock file only includes psycopg (no binary).
openapi-core must be installed (poetry run pip install openapi-core) for the OpenAPI compliance tests in tests/test_litellm/interactions/.
The --timeout pytest flag is NOT available; don't pass it.
Unit tests: poetry run pytest tests/test_litellm/ -x -vv -n 4
Black --check may report pre-existing formatting issues; this does not block test runs.
If poetry install fails with "pyproject.toml changed significantly since poetry.lock was last generated", run poetry lock first to regenerate the lock file.

Lint

cd litellm && poetry run ruff check .

Ruff is the primary fast linter. For the full lint suite (including mypy, black, circular imports), run make lint per CLAUDE.md.

UI Dashboard development

The UI is at ui/litellm-dashboard/. Run npm run dev from that directory for the Next.js dev server on port 3000.
The proxy at port 4000 serves a pre-built static UI from litellm/proxy/_experimental/out/. After making UI code changes, you must run npm run build in the dashboard directory and copy the output: cp -r ui/litellm-dashboard/out/* litellm/proxy/_experimental/out/ for the proxy to serve the updated UI.
SVGs used as provider logos (loaded via <img> tags) must NOT use fill="currentColor" — replace with an explicit color like #000000 or use the -color variant from lobehub icons, since CSS color inheritance does not work inside <img> elements.
Provider logos live in ui/litellm-dashboard/public/assets/logos/ (source) and litellm/proxy/_experimental/out/assets/logos/ (pre-built). Both locations must have the file for it to work in dev and proxy-served modes.
UI Vitest tests: cd ui/litellm-dashboard && npx vitest run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTRUCTIONS FOR LITELLM

OVERVIEW

REPOSITORY STRUCTURE

Core Components

Key Directories

DEVELOPMENT GUIDELINES

MAKING CODE CHANGES

MAKING CODE CHANGES FOR THE UI (IGNORE FOR BACKEND)

IMPORTANT PATTERNS

PROXY SERVER (LLM GATEWAY)

MCP (MODEL CONTEXT PROTOCOL) SUPPORT

RUNNING SCRIPTS

GITHUB TEMPLATES

Bug Reports (`.github/ISSUE_TEMPLATE/bug_report.yml`)

Feature Requests (`.github/ISSUE_TEMPLATE/feature_request.yml`)

Pull Requests (`.github/pull_request_template.md`)

TESTING CONSIDERATIONS

DOCUMENTATION

SECURITY CONSIDERATIONS

ENTERPRISE FEATURES

COMMON PITFALLS TO AVOID

HELPFUL RESOURCES

WHEN IN DOUBT

Cursor Cloud specific instructions

Environment

Running the proxy server

Running tests

Lint

UI Dashboard development

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

INSTRUCTIONS FOR LITELLM

OVERVIEW

REPOSITORY STRUCTURE

Core Components

Key Directories

DEVELOPMENT GUIDELINES

MAKING CODE CHANGES

MAKING CODE CHANGES FOR THE UI (IGNORE FOR BACKEND)

IMPORTANT PATTERNS

PROXY SERVER (LLM GATEWAY)

MCP (MODEL CONTEXT PROTOCOL) SUPPORT

RUNNING SCRIPTS

GITHUB TEMPLATES

Bug Reports (.github/ISSUE_TEMPLATE/bug_report.yml)

Feature Requests (.github/ISSUE_TEMPLATE/feature_request.yml)

Pull Requests (.github/pull_request_template.md)

TESTING CONSIDERATIONS

DOCUMENTATION

SECURITY CONSIDERATIONS

ENTERPRISE FEATURES

COMMON PITFALLS TO AVOID

HELPFUL RESOURCES

WHEN IN DOUBT

Cursor Cloud specific instructions

Environment

Running the proxy server

Running tests

Lint

UI Dashboard development

Bug Reports (`.github/ISSUE_TEMPLATE/bug_report.yml`)

Feature Requests (`.github/ISSUE_TEMPLATE/feature_request.yml`)

Pull Requests (`.github/pull_request_template.md`)