Skip to content

Latest commit

 

History

History
274 lines (204 loc) · 12.9 KB

File metadata and controls

274 lines (204 loc) · 12.9 KB

INSTRUCTIONS FOR LITELLM

This document provides comprehensive instructions for AI agents working in the LiteLLM repository.

OVERVIEW

LiteLLM is a unified interface for 100+ LLMs that:

  • Translates inputs to provider-specific completion, embedding, and image generation endpoints
  • Provides consistent OpenAI-format output across all providers
  • Includes retry/fallback logic across multiple deployments (Router)
  • Offers a proxy server (LLM Gateway) with budgets, rate limits, and authentication
  • Supports advanced features like function calling, streaming, caching, and observability

REPOSITORY STRUCTURE

Core Components

  • litellm/ - Main library code
    • llms/ - Provider-specific implementations (OpenAI, Anthropic, Azure, etc.)
    • proxy/ - Proxy server implementation (LLM Gateway)
    • router_utils/ - Load balancing and fallback logic
    • types/ - Type definitions and schemas
    • integrations/ - Third-party integrations (observability, caching, etc.)

Key Directories

  • tests/ - Comprehensive test suites
  • docs/my-website/ - Documentation website
  • ui/litellm-dashboard/ - Admin dashboard UI
  • enterprise/ - Enterprise-specific features

DEVELOPMENT GUIDELINES

MAKING CODE CHANGES

  1. Provider Implementations: When adding/modifying LLM providers:

    • Follow existing patterns in litellm/llms/{provider}/
    • Implement proper transformation classes that inherit from BaseConfig
    • Support both sync and async operations
    • Handle streaming responses appropriately
    • Include proper error handling with provider-specific exceptions
  2. Type Safety:

    • Use proper type hints throughout
    • Update type definitions in litellm/types/
    • Ensure compatibility with both Pydantic v1 and v2
  3. Testing:

    • Add tests in appropriate tests/ subdirectories
    • Include both unit tests and integration tests
    • Test provider-specific functionality thoroughly
    • Consider adding load tests for performance-critical changes

MAKING CODE CHANGES FOR THE UI (IGNORE FOR BACKEND)

  1. Tremor is DEPRECATED, do not use Tremor components in new features/changes

    • The only exception is the Tremor Table component and its required Tremor Table sub components.
  2. Use Common Components as much as possible:

    • These are usually defined in the common_components directory
    • Use these components as much as possible and avoid building new components unless needed
  3. Testing:

    • The codebase uses Vitest and React Testing Library
    • Query Priority Order: Use query methods in this order: getByRole, getByLabelText, getByPlaceholderText, getByText, getByTestId
    • Always use screen instead of destructuring from render() (e.g., use screen.getByText() not getByText)
    • Wrap user interactions in act(): Always wrap fireEvent calls with act() to ensure React state updates are properly handled
    • Use query methods for absence checks: Use queryBy* methods (not getBy*) when expecting an element to NOT be present
    • Test names must start with "should": All test names should follow the pattern it("should ...")
    • Mock external dependencies: Check setupTests.ts for global mocks and mock child components/networking calls as needed
    • Structure tests properly:
      • First test should verify the component renders successfully
      • Subsequent tests should focus on functionality and user interactions
      • Use waitFor for async operations that aren't already awaited
    • Avoid using querySelector: Prefer React Testing Library queries over direct DOM manipulation

IMPORTANT PATTERNS

  1. Function/Tool Calling:

    • LiteLLM standardizes tool calling across providers
    • OpenAI format is the standard, with transformations for other providers
    • See litellm/llms/anthropic/chat/transformation.py for complex tool handling
  2. Streaming:

    • All providers should support streaming where possible
    • Use consistent chunk formatting across providers
    • Handle both sync and async streaming
  3. Error Handling:

    • Use provider-specific exception classes
    • Maintain consistent error formats across providers
    • Include proper retry logic and fallback mechanisms
  4. Configuration:

    • Support both environment variables and programmatic configuration
    • Use BaseConfig classes for provider configurations
    • Allow dynamic parameter passing

PROXY SERVER (LLM GATEWAY)

The proxy server is a critical component that provides:

  • Authentication and authorization
  • Rate limiting and budget management
  • Load balancing across multiple models/deployments
  • Observability and logging
  • Admin dashboard UI
  • Enterprise features

Key files:

  • litellm/proxy/proxy_server.py - Main server implementation
  • litellm/proxy/auth/ - Authentication logic
  • litellm/proxy/management_endpoints/ - Admin API endpoints

Database (proxy): Use Prisma model methods (prisma_client.db.<model>.upsert, .find_many, .find_unique, etc.), not raw SQL (execute_raw/query_raw). See COMMON PITFALLS for details.

MCP (MODEL CONTEXT PROTOCOL) SUPPORT

LiteLLM supports MCP for agent workflows:

  • MCP server integration for tool calling
  • Transformation between OpenAI and MCP tool formats
  • Support for external MCP servers (Zapier, Jira, Linear, etc.)
  • See litellm/experimental_mcp_client/ and litellm/proxy/_experimental/mcp_server/

RUNNING SCRIPTS

Use poetry run python script.py to run Python scripts in the project environment (for non-test files).

GITHUB TEMPLATES

When opening issues or pull requests, follow these templates:

Bug Reports (.github/ISSUE_TEMPLATE/bug_report.yml)

  • Describe what happened vs. expected behavior
  • Include relevant log output
  • Specify LiteLLM version
  • Indicate if you're part of an ML Ops team (helps with prioritization)

Feature Requests (.github/ISSUE_TEMPLATE/feature_request.yml)

  • Clearly describe the feature
  • Explain motivation and use case with concrete examples

Pull Requests (.github/pull_request_template.md)

  • Add at least 1 test in tests/litellm/
  • Ensure make test-unit passes

TESTING CONSIDERATIONS

  1. Provider Tests: Test against real provider APIs when possible
  2. Proxy Tests: Include authentication, rate limiting, and routing tests
  3. Performance Tests: Load testing for high-throughput scenarios
  4. Integration Tests: End-to-end workflows including tool calling

DOCUMENTATION

  • Keep documentation in sync with code changes
  • Update provider documentation when adding new providers
  • Include code examples for new features
  • Update changelog and release notes

SECURITY CONSIDERATIONS

  • Handle API keys securely
  • Validate all inputs, especially for proxy endpoints
  • Consider rate limiting and abuse prevention
  • Follow security best practices for authentication

ENTERPRISE FEATURES

  • Some features are enterprise-only
  • Check enterprise/ directory for enterprise-specific code
  • Maintain compatibility between open-source and enterprise versions

COMMON PITFALLS TO AVOID

  1. Breaking Changes: LiteLLM has many users - avoid breaking existing APIs

  2. Provider Specifics: Each provider has unique quirks - handle them properly

  3. Rate Limits: Respect provider rate limits in tests

  4. Memory Usage: Be mindful of memory usage in streaming scenarios

  5. Dependencies: Keep dependencies minimal and well-justified

  6. UI/Backend Contract Mismatch: When adding a new entity type to the UI, always check whether the backend endpoint accepts a single value or an array. Match the UI control accordingly (single-select vs. multi-select) to avoid silently dropping user selections

  7. Missing Tests for New Entity Types: When adding a new entity type (e.g., in EntityUsage, UsageViewSelect), always add corresponding tests in the existing test files and update any icon/component mocks

  8. Raw SQL in proxy DB code: Do not use execute_raw or query_raw for proxy database access. Use Prisma model methods (e.g. prisma_client.db.litellm_tooltable.upsert(), .find_many(), .find_unique()) so behavior stays consistent with the schema, the client stays mockable in tests, and you avoid the pitfalls of hand-written SQL (parameter ordering, type casting, schema drift)

  9. Do not hardcode model-specific flags: Put model-specific capability flags in model_prices_and_context_window.json and read them via get_model_info (or existing helpers like supports_reasoning). This prevents users from needing to upgrade LiteLLM each time a new model supports a feature.

    Example of BAD (hardcoded model checks):

    @staticmethod
    def _is_effort_supported_model(model: str) -> bool:
        """Check if the model supports the output_config.effort parameter..."""
        model_lower = model.lower()
        if AnthropicConfig._is_claude_4_6_model(model):
            return True
        return any(
            v in model_lower for v in ("opus-4-5", "opus_4_5", "opus-4.5", "opus_4.5")
        )

    Example of GOOD (config-driven or helper that reads from config):

    if (
        "claude-3-7-sonnet" in model
        or AnthropicConfig._is_claude_4_6_model(model)
        or supports_reasoning(
            model=model,
            custom_llm_provider=self.custom_llm_provider,
        )
    ):
        ...

    Using helpers like supports_reasoning (which read from model_prices_and_context_window.json / get_model_info) allows future model updates to "just work" without code changes.

  10. Never close HTTP/SDK clients on cache eviction: Do not add close(), aclose(), or create_task(close_fn()) inside LLMClientCache._remove_key() or any cache eviction path. Evicted clients may still be held by in-flight requests; closing them causes RuntimeError: Cannot send a request, as the client has been closed. in production after the cache TTL (1 hour) expires. Connection cleanup is handled at shutdown by close_litellm_async_clients(). See PR #22247 for the full incident history.

HELPFUL RESOURCES

  • Main documentation: https://docs.litellm.ai/
  • Provider-specific docs in docs/my-website/docs/providers/
  • Admin UI for testing proxy features

WHEN IN DOUBT

  • Follow existing patterns in the codebase
  • Check similar provider implementations
  • Ensure comprehensive test coverage
  • Update documentation appropriately
  • Consider backward compatibility impact

Cursor Cloud specific instructions

Environment

  • Poetry is installed in ~/.local/bin; the update script ensures it is on PATH.
  • Python 3.12, Node 22 are pre-installed.
  • The virtual environment lives under ~/.cache/pypoetry/virtualenvs/.

Running the proxy server

Start the proxy with a config file:

poetry run litellm --config dev_config.yaml --port 4000

The proxy takes ~15-20 seconds to fully start (it runs Prisma migrations on boot). Wait for /health to return before sending requests. Without a PostgreSQL DATABASE_URL, the proxy connects to a default Neon dev database embedded in the litellm-proxy-extras package.

Running tests

See CLAUDE.md and the Makefile for standard commands. Key notes:

  • psycopg-binary must be installed (poetry run pip install psycopg-binary) because the pytest-postgresql plugin requires it and the lock file only includes psycopg (no binary).
  • openapi-core must be installed (poetry run pip install openapi-core) for the OpenAPI compliance tests in tests/test_litellm/interactions/.
  • The --timeout pytest flag is NOT available; don't pass it.
  • Unit tests: poetry run pytest tests/test_litellm/ -x -vv -n 4
  • Black --check may report pre-existing formatting issues; this does not block test runs.
  • If poetry install fails with "pyproject.toml changed significantly since poetry.lock was last generated", run poetry lock first to regenerate the lock file.

Lint

cd litellm && poetry run ruff check .

Ruff is the primary fast linter. For the full lint suite (including mypy, black, circular imports), run make lint per CLAUDE.md.

UI Dashboard development

  • The UI is at ui/litellm-dashboard/. Run npm run dev from that directory for the Next.js dev server on port 3000.
  • The proxy at port 4000 serves a pre-built static UI from litellm/proxy/_experimental/out/. After making UI code changes, you must run npm run build in the dashboard directory and copy the output: cp -r ui/litellm-dashboard/out/* litellm/proxy/_experimental/out/ for the proxy to serve the updated UI.
  • SVGs used as provider logos (loaded via <img> tags) must NOT use fill="currentColor" — replace with an explicit color like #000000 or use the -color variant from lobehub icons, since CSS color inheritance does not work inside <img> elements.
  • Provider logos live in ui/litellm-dashboard/public/assets/logos/ (source) and litellm/proxy/_experimental/out/assets/logos/ (pre-built). Both locations must have the file for it to work in dev and proxy-served modes.
  • UI Vitest tests: cd ui/litellm-dashboard && npx vitest run