Problem Statement
To test WVA optimization strategies, we would like to add multiple base models in the system. This would include a single gateway, two epps, two inference pools, and a custom HTTPRoute with multiple backendRefs. We would like to know if such a scenario is supported by modelservice.
Proposed Solution
ModelService should be able to launch multiple base models.
Alternatives Considered
llm-d/llm-d-workload-variant-autoscaler#1014
Willingness to Contribute
Yes, with guidance
Additional Context
No response
Problem Statement
To test WVA optimization strategies, we would like to add multiple base models in the system. This would include a single gateway, two epps, two inference pools, and a custom HTTPRoute with multiple backendRefs. We would like to know if such a scenario is supported by modelservice.
Proposed Solution
ModelService should be able to launch multiple base models.
Alternatives Considered
llm-d/llm-d-workload-variant-autoscaler#1014
Willingness to Contribute
Yes, with guidance
Additional Context
No response