Allow ability for MPS to have single replica#1655
Open
vrdn-23 wants to merge 1 commit intoNVIDIA:mainfrom
Open
Allow ability for MPS to have single replica#1655vrdn-23 wants to merge 1 commit intoNVIDIA:mainfrom
vrdn-23 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
7f4202b to
b22d56d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Allow
replicas: 1in MPS sharing configuration so the MPS daemon can provide concurrent GPU access without per-client resource throttling. Fixes #1548Motivation
When using MPS purely as a concurrency layer — where an external device plugin handles scheduling — the current minimum of
replicas: 2forces the MPS daemon to impose unnecessary per-client limits:100 / replicas= 50% per clienttotal_memory / replicas= half per clientThis means every MPS client is capped at 50% GPU compute, even when it's the only process running. The remaining capacity sits idle.
With
replicas: 1, the daemon sets 100% thread percentage and full memory per client — MPS provides spatial sharing (multiple CUDA processes executing concurrently on different SMs) without artificial throttling.How
replicas: 1differs from no MPSBoth configurations result in no per-client throttling, but the execution model is different:
replicas: 1DEFAULTEXCLUSIVE_PROCESSMPS with
replicas: 1is the right choice when you want true concurrent GPU execution for multiple pods (which are scheduled based on some other resource like GPU memory) without artificially limiting any individual client's resource usage.Changes
replicas.go)isReplicated()to recognizereplicas = 1as a valid sharing configuration, soSharingStrategy()correctly returns MPS and the daemon startsTest plan
go test ./api/config/v1/... ./cmd/mps-control-daemon/mps/...)replicas: 1config parses successfullyassertReplicas()passes for single replica on pre-Volta and Volta+ devicesreplicas: 0andreplicas: -1still rejected