-
Notifications
You must be signed in to change notification settings - Fork 188
Move SLO routing functionality to a single multi-plugin #1780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move SLO routing functionality to a single multi-plugin #1780
Conversation
Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
…-sigs#1549) Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.16.0 to 0.17.0. - [Commits](golang/sync@v0.16.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/sync dependency-version: 0.17.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ubernetes-sigs#1548) Bumps [sigs.k8s.io/controller-tools](https://github.com/kubernetes-sigs/controller-tools) from 0.18.0 to 0.19.0. - [Release notes](https://github.com/kubernetes-sigs/controller-tools/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-tools/blob/main/envtest-releases.yaml) - [Commits](kubernetes-sigs/controller-tools@v0.18.0...v0.19.0) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-tools dependency-version: 0.19.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Nir Rozenbaum <[email protected]>
* uniquely name CRBAC Signed-off-by: greg pereira <[email protected]> * bugfix with testing Signed-off-by: greg pereira <[email protected]> --------- Signed-off-by: greg pereira <[email protected]>
…#1518) * Update type for priority from uint to int in EPP flow control * Update tests to accomodate for priority changes * Avoid using Reverse to sort in descending order, update comments
… other than main (kubernetes-sigs#1570) Signed-off-by: Nir Rozenbaum <[email protected]>
…mpletions (kubernetes-sigs#1446) * - added more useful fields to types.LLMRequest: 1. cleaner API declaration 2. data fields are preserved, after-read transformations are done in plugins 3. prefix-cache scorer does not need naive templating - minor bugfixes and improvements Signed-off-by: Maroon Ayoub <[email protected]> * removed LLMRequestData::String Signed-off-by: Maroon Ayoub <[email protected]> * - rename LLMRequestData to LLMRequestBody - rename LLMRequest.Data to LLMRequest.Body - test refactoring after rebase Signed-off-by: Maroon Ayoub <[email protected]> --------- Signed-off-by: Maroon Ayoub <[email protected]>
* epp servicemonitor and clusterpodmonitor templates Signed-off-by: sallyom <[email protected]> * add monitoring chart doc Signed-off-by: sallyom <[email protected]> --------- Signed-off-by: sallyom <[email protected]>
* Replace Gateway API with Inference Extension This replaces references regarding Gateway API with references regarding Gateway API Inference Extension (or vice versa, as appropriate) in site-src/gieps/overview.md * Replace spec with x-spec in a link This fixes a link to the reference page for Inference Model.
Signed-off-by: Maroon Ayoub <[email protected]>
Replacing InferenceModel with InferenceObjective
Signed-off-by: Daneyon Hansen <[email protected]>
…sigs#1579) Signed-off-by: Daneyon Hansen <[email protected]>
* Updating the guides in the doc site * adding priority and capacity section
* feat(flowcontrol): Refactor FlowRegistry contracts
This commit refactors some of the core Flow Control contracts to improve
clarity and better align with their intended roles. The goal is to
create a more intuitive and robust interface for the upcoming top-level
FlowController.
Key changes include:
- The `FlowRegistryClient` interface is renamed to
`FlowRegistryDataPlane` to more accurately reflect its role in the
high-throughput request path.
- The `FlowRegistryAdmin` interface is renamed to `FlowRegistryObserver`
to clarify its read-only, observational nature.
- The `ActiveFlowConnection.Shards()` method is renamed to
`ActiveFlowConnection.ActiveShards()` to make it explicit that it
returns only active, schedulable shards. This removes ambiguity for
the distributor logic.
- `ShardStats` is enriched with `ID` and `IsActive` fields, providing
consumers with more context about the shard's state at the time the
snapshot was taken.
- The registry implementation has been updated to match these new
contract definitions.
* refactor: Adapt ShardProcessor to a worker role
This commit refactors the `ShardProcessor` to function as a stateful
worker managed by a higher-level supervisor. This is a preparatory step
for the introduction of the new top-level `FlowController`.
The public API of the processor is changed from a direct `Enqueue`
method to a more sophisticated, channel-based submission model with
`Submit` (non-blocking) and `SubmitOrBlock` (blocking). This decouples
the producer from the processor's main loop, enabling better
backpressure signals and higher throughput.
Key changes include:
- Introduction of `Submit` and `SubmitOrBlock` for asynchronous request
handoff.
- `FlowItem`'s finalization logic is improved to be more robust and
channel-based.
- Error handling within the dispatch cycle is refactored (no logic
change) to be more clear about how it promotes work conservation by
isolating failures to a single priority band.
* feat: Introduce the FlowController supervisor
This commit introduces the `FlowController`, a high-throughput, sharded
supervisor that orchestrates a pool of stateful `ShardProcessor`
workers. This new component is the central processing engine of the Flow
Control system, implementing a "supervisor-worker" pattern.
Key features of the `FlowController` include:
- Supervisor-Worker Architecture: Acts as a stateless supervisor,
managing the lifecycle of stateful `ShardProcessor` workers. It
includes a reconciliation loop to garbage-collect workers for stale
shards.
- Flow-Aware Load Balancing: Implements a "Join-Shortest-Queue-by-Bytes"
(JSQ-Bytes) algorithm to distribute incoming requests to the
least-loaded worker, promoting emergent fairness.
- Synchronous API: Exposes a blocking `EnqueueAndWait` method, which
simplifies client integration (e.g., with Envoy `ext_proc`) and
provides direct backpressure.
- Lazy Worker Initialization: Workers are created on-demand when a shard
shard first becomes active to conserve resources and reduce contention
on the hot path.
- Configuration: A new `Config` object allows for tuning parameters like
TTLs, buffer sizes, and reconciliation intervals.
* docs: Update comments to align with FlowController
This commit updates documentation and code comments across various
framework components to align with the concepts and architecture
introduced by the `FlowController`.
Key changes include:
- FCFS Policy: Clarified the distinction between "logical" and
"physical" enqueue time and the behavioral trade-offs when pairing
with different queue capabilities.
- ListQueue: Expanded the documentation to explain its role as a
high-performance, approximate FCFS queue in the context of the
`FlowController`'s retry mechanics.
- Request Types: Refined the comments for `QueueItemAccessor` to be more
precise about the meaning of `EnqueueTime`.
* refactor Simplify controller Lifecycle
This commit refactors the `FlowController` to simplify its startup and
shutdown lifecycle, making it more robust and easier to reason about.
It also incorporates several smaller improvements based on reviewer
feedback.
The primary change addresses a complex lifecycle implementation that
used an `atomic.Bool` (`isRunning`) and a `ready` channel to manage
state.
Key changes:
- **Simplified Lifecycle:** The controller's lifecycle is now tied
directly to a `context` passed into `NewFlowController`. The `Run`
method has been unexported, and the main `run` loop is started as a
goroutine from the constructor. This eliminates the `ready` channel
and `isRunning` flag in addition to simplifying the interface for
callers.
- **Robust Worker Creation:** The `getOrStartWorker` logic has been
improved to ensure that in a race to create a worker, the "losing"
goroutine correctly cleans up its resources and does not start a
redundant processor. This fixes a bug where the losing worker would
evict all items from its queues on shutdown which were shared
instances with the winning worker resulting in premature request
finalization.
- **Comment Reduction:** The extensive explanatory comments in
`distributeRequest` have been condensed to be more concise while
retaining the essential details of the algorithm.
- **Minor Cleanups:**
- The initial, unnecessary call to `reconcileProcessors()` at
startup has been removed.
- Error messages have been clarified (e.g., "acquire lease" instead
of "establish connection").
- A typed error for nil requests was replaced with a standard
`errors.New`.
…s#1550) Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.23.0 to 1.23.2. - [Release notes](https://github.com/prometheus/client_golang/releases) - [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md) - [Commits](prometheus/client_golang@v1.23.0...v1.23.2) --- updated-dependencies: - dependency-name: github.com/prometheus/client_golang dependency-version: 1.23.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
) Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
…s#1568) * add latency predictor * add cv in model and update epp deployment * bug fix * track mape for predictions * add running queue size to metrics * add xgboost regressor and update tpot sampling logic * emit predicted and actual ttft tpot in body * seperate servers for training and prediction * add latency predictor put the predictor functions in director in a helper function add scores to reqcxt record prediction duration metrics add prefix cache score to model input slo based routing changes retreive request priority queue from the datastore update scoring logic * better inital implemenation Add scheduling profile, working state remove latencypredictor from director Move all latency prediction logic out of director and into scheduling profile. Make all Request/Response plugins take in RequestContext * progress towards fixing up merge conflicts from latency predictor merge * More refactor progress, fixing and adding tests * working state, latency prediction * Clean up changes, remove unneeded files, working functionality without latency flag and scheduling plugins * Rebase cleanup, remove duplicate lines * Integrate new alpha-beta slo scoring into scoring plugin * Fix prefix cache scoring for slo-aware routing * Add pycache or latency predictor to gitignore * Rebase with main * Fix prefix cache scoring being piped to latencyprediction_helper * add dependancies in scorer * chage to single profile * chage to single profile * restore two profiles * restore two profiles * restore two profiles * update admit request to shed based on predictions * add TODOs for future changes * Change artifact registry references to personal compiled images * Fix existing non-slo aware routing unit tests * update latency predictor with better eval metrics * Fix saturation detector unit test * Change naming of SLO headers and prediction based routing header * Remove port 9002 service on InferencePool causing make test to fail * Fix epp hermetic integration test to expect ProcessingMode Send in response header --------- Co-authored-by: kaushikmitr <[email protected]>
…nstants from director
|
Hi @BenjaminBraunDev. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages. The list of commits with invalid commit messages:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/ok-to-test |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenjaminBraunDev, kfswain The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR moves SLO aware routing functionality into a multi plugin with both a scorer (scheduling hook) and request tracking plugins (requestcontrol hooks). This removes the need to change endpoint/podmetric/datastore, as we can track everything in the plugin itself.
It also rebases close to the current main, hence the large number of changes.