Conversation
|
🚨 Unsigned commits detected! Please sign your commits. For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation. |
c170ff9 to
bea0ea5
Compare
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
bea0ea5 to
5883271
Compare
Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com> Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
…plete The x-program-id response header is not consumed by any downstream code. Remove the ResponseReceived implementation, its interface assertion, and associated tests. Add a nil-request guard to ResponseComplete to match the upstream framework pattern. Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com> Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
5883271 to
6b202e2
Compare
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
| // For each queue in the band, the configured ScoringStrategy is given a chance | ||
| // to update its per-program state (OnPickStart), then the queue with the highest | ||
| // score is selected for dispatch. | ||
| func (p *ProgramAwarePlugin) Pick(_ context.Context, band flowcontrol.PriorityBandAccessor) (flowcontrol.FlowQueueAccessor, error) { |
There was a problem hiding this comment.
question: while it's true you have no consumers of the ctx in this method. should we perform this work if ctx.Err()?
|
|
||
| if band == nil { | ||
| return nil, nil //nolint:nilnil | ||
| } |
There was a problem hiding this comment.
question: nil band is not an error? this means the caller even when the err == nil must nil check the return is nil. also won't this give inaccurate latency metrics for work that was not done?
There was a problem hiding this comment.
We return (nil, nil) instead of (nil, err) because this is how the contract for fairness policy has been defined. The existing Fairness plugins also follow the same convention for example, round-robin.
From the behaviour point of view, both cases have same behaviour, that is no item will be dispatched from that band. But in error case the error will be logged.
| "sync/atomic" | ||
| ) | ||
|
|
||
| const ewmaAlpha = 0.2 |
There was a problem hiding this comment.
question: how did you arrive at 0.2 for the EWMA alpha? is this based on benchmarking, or should it be configurable via the plugin config alongside strategy?
There was a problem hiding this comment.
We are still experimenting with the parameters. Ideally, we can make it a plugin config once we finalize the programmable parameters for each strategy.
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
bd28492 to
567ec2d
Compare
Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
…d-inference-scheduler into program-aware-plugin-test
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Pravein Govindan Kannan <pravein.govindan.kannan@ibm.com>
Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
567ec2d to
9392bfa
Compare
Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
Simplify strategy interface
Add cycle-aware quantum allocation to DRRStrategy. The first Pick() in a dispatch cycle allocates quantum to all non-empty queues; subsequent Pick() calls only allocate to unseen programs. OnPreRequest resets the cycle flag so the next dispatch gets fresh quantum. Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
Move token extraction from ResponseComplete into each strategy's OnCompleted, making the hook signature symmetric with OnPreRequest. Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
When enabled, Pick() stores the selected program in pendingCursor instead of advancing lastSelected. OnPreRequest() commits it after a real dispatch, preventing speculative picks from moving the cursor. Signed-off-by: Dasari Surya Sai Venkatesh <suryasai.venkatesh@gmail.com>
Drr prerequest fix
This PR introduces program-aware plugin would enable identifying a request from an agentic program based on its program-id (x-gateway-inference-fairness-id) and performing scheduling decisions based on program-level metrics. Additionally, this plugin captures program-level metrics and exports to the prometheus endpoint.
Program aware plugin implements following plugin interfaces:
Prepare Data Interface: Extracts program information from request headers, subsequently updates the relevant program metrics and request metadata. The plugin assumes that a request arrives with a fairness ID (x-gateway-inference-fairness-id) to identify an agentic program.
Fairness Interface (Flow Control): We implement flow-control fairness plugin's
Pickinterface. Here we enable multiple strategies which can be configured. Currently, we are implemented two strategies, a simple EWMA based, and Deficit Round Robin (DRR).Pre Request Interface: Updates program metrics immediately before dispatch. For example, this hook is used to calculate the wait time (time spent in EPP/queue) of requests per program, and keep track of requests sent to the vLLM inference pod. In future, this hook could also be used to add a vLLM priority to the request.
Response Received Interface: Updates program metrics like deficit counters (for DRR) for tokens used.
We are currently evaluating the scheduling strategies using inference-perf benchmarks.