Hi,
I have a requirement to run two models on a single NVIDIA ADA6000 GPU using the GPU Operator and MPS (Multi-Process Service):
I’d like to know if it’s possible to configure MPS via the GPU Operator so that the GPU can be split into these two “memory slices” (30 GB + 18 GB) to run both models simultaneously.
- Does MPS support explicit memory quotas or limits for each process when launched this way?
- Can i start my 2 Pods on this Node?
- If not, is there another recommended approach (e.g., MIG, CUDA_VISIBLE_DEVICES tricks, or GPU Operator configuration) to achieve similar memory partitioning on an ADA6000?
Thanks in advance for any guidance i am pretty new in this stuff