generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 70
Open
Milestone
Description
P0
- [Loadgen] Multi-model or LoRA support and traffic splitting
- [Datagen] Shared prefix datagen incorrect prompt length #230
P1
- [Datagen] Support different input / output distribution for different stages (helpful for autoscaling)
- [Metrics] GPU utilization / other hardware metrics from Prometheus
- [Metrics] SLO support and conformance of specific latency SLOs
P2
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels