Skip to content

Allow ability for MPS to have single replica#1655

Open
vrdn-23 wants to merge 1 commit intoNVIDIA:mainfrom
vrdn-23:vidamoda/mps-single-replica
Open

Allow ability for MPS to have single replica#1655
vrdn-23 wants to merge 1 commit intoNVIDIA:mainfrom
vrdn-23:vidamoda/mps-single-replica

Conversation

@vrdn-23
Copy link

@vrdn-23 vrdn-23 commented Mar 11, 2026

Summary

Allow replicas: 1 in MPS sharing configuration so the MPS daemon can provide concurrent GPU access without per-client resource throttling. Fixes #1548

Motivation

When using MPS purely as a concurrency layer — where an external device plugin handles scheduling — the current minimum of replicas: 2 forces the MPS daemon to impose unnecessary per-client limits:

  • Active thread percentage: 100 / replicas = 50% per client
  • Pinned memory limit: total_memory / replicas = half per client

This means every MPS client is capped at 50% GPU compute, even when it's the only process running. The remaining capacity sits idle.

With replicas: 1, the daemon sets 100% thread percentage and full memory per client — MPS provides spatial sharing (multiple CUDA processes executing concurrently on different SMs) without artificial throttling.

How replicas: 1 differs from no MPS

Both configurations result in no per-client throttling, but the execution model is different:

No MPS MPS with replicas: 1
Compute mode DEFAULT EXCLUSIVE_PROCESS
Concurrent execution Time-slicing (one process at a time, context switches) Spatial sharing (kernels from different clients run concurrently on different SMs)
Per-client limits None None (100% thread, full memory)
GPU access Any process on the node Only processes connecting through the MPS pipe
MPS daemon Not running Running

MPS with replicas: 1 is the right choice when you want true concurrent GPU execution for multiple pods (which are scheduled based on some other resource like GPU memory) without artificially limiting any individual client's resource usage.

Changes

  • Lower minimum replicas from 2 to 1 in config validation (replicas.go)
  • Update isReplicated() to recognize replicas = 1 as a valid sharing configuration, so SharingStrategy() correctly returns MPS and the daemon starts
  • Add test coverage for single-replica MPS configuration

Test plan

  • Existing tests pass (go test ./api/config/v1/... ./cmd/mps-control-daemon/mps/...)
  • New test: replicas: 1 config parses successfully
  • New test: assertReplicas() passes for single replica on pre-Volta and Volta+ devices
  • replicas: 0 and replicas: -1 still rejected

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Vinay Damodaran <vrdn@hey.com>
@vrdn-23 vrdn-23 force-pushed the vidamoda/mps-single-replica branch from 7f4202b to b22d56d Compare March 11, 2026 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting MPS replicas to 1

1 participant