-
Notifications
You must be signed in to change notification settings - Fork 228
Description
What would you like to be added:
A scorer that's configurable to score based on an arbitrary metric from the model server. To do this we need to configure: the metric spec (name, label, type, etc.) and the scoring algo. To begin with, we will only support "gauge" metric, and a "linear normalization" algo which produces a normalized [0-1] score based on either lower_is_better or higher_is_better.
An example:
plugins:
- type: metric-scorer
parameters:
metric:
name: "num_requests_running"
type: "gauge"
labels: # labels are a list of key-value pairs to match
k1: v1
algo:
type: "linear_normalization"
direction: "lower_is_better" # Higher raw metric = lower final score
Why is this needed:
While we implemented several in-tree scorers based on well-known metrics such as kv cache utilization, some use cases may desire a different metric. Having a generic metric scorer avoids the friction of writing a new scorer each time.
One potential use case is that running requests is desired than queue depth for latency sensitive applications.
TODO: This feature would require the ability to configure the data layer to scrape arbitrary metrics. I will need to look deeper on this.