Skip to content

Model fallback chains #45

@christianromeni

Description

@christianromeni

Problem / Motivation

Load balancing works across deployments of the same model, but there is no way to define cross-model failover. If a primary model is down, the request fails instead of falling back to an alternative.

Proposed Solution

Allow defining fallback chains at the model level. When the primary model (all its deployments) is unavailable, the proxy tries the next model in the chain. Example:

models:
  - name: gpt-4o
    fallback: claude-sonnet
    # ...
  - name: claude-sonnet
    fallback: llama-70b
    # ...

Acceptance Criteria

  • Models can reference another model as fallback
  • Fallback triggers when all deployments of the primary model are unavailable
  • Fallback chain depth is configurable (default: 3)
  • Usage tracking records which model actually handled the request

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions