-
Notifications
You must be signed in to change notification settings - Fork 3
Model fallback chains #45
Copy link
Copy link
Open
Labels
area:proxyProxy hot pathProxy hot pathenhancementNew feature or requestNew feature or requestenterpriseEnterprise-only featuresEnterprise-only featurespriority:lowNice to have, no urgencyNice to have, no urgency
Metadata
Metadata
Assignees
Labels
area:proxyProxy hot pathProxy hot pathenhancementNew feature or requestNew feature or requestenterpriseEnterprise-only featuresEnterprise-only featurespriority:lowNice to have, no urgencyNice to have, no urgency
Problem / Motivation
Load balancing works across deployments of the same model, but there is no way to define cross-model failover. If a primary model is down, the request fails instead of falling back to an alternative.
Proposed Solution
Allow defining fallback chains at the model level. When the primary model (all its deployments) is unavailable, the proxy tries the next model in the chain. Example:
Acceptance Criteria