-
Notifications
You must be signed in to change notification settings - Fork 7
Add Nvidia MPS component for managing Nvidia GPU resources #647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 8 commits
689d94a
0cfc9a1
8a8d64b
ae177fb
7a07628
bea17f9
9e9fa8a
022b325
75a3771
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,7 +15,62 @@ | |
| [Unreleased](https://github.com/bird-house/birdhouse-deploy/tree/master) (latest) | ||
| ------------------------------------------------------------------------------------------------------------------ | ||
|
|
||
| [//]: # (list changes here, using '-' for each new entry, remove this when items are added) | ||
| ## Changes | ||
|
|
||
| - Add Nvidia MPS component for managing Nvidia GPU resources | ||
|
|
||
| This creates a container running Nvidia's Multi Process Service ([MPS](https://docs.nvidia.com/deploy/mps/index.html)) | ||
| which helps manage multi-user GPU access. | ||
| It runs an alternative CUDA interface which manages resource allocation when multiple processes are running simultaneously | ||
| on the same GPU. | ||
| It also allows the node admin to set additional per-user limits through the `JUPYTERHUB_RESOURCE_LIMITS` variable | ||
| which configures Jupyterlab containers: | ||
|
|
||
| - `"gpu_device_mem_limit"`: sets the `CUDA_MPS_PINNED_DEVICE_MEM_LIMIT` environment variable | ||
| - `"gpu_active_thread_percentage"`: sets the `CUDA_MPS_ACTIVE_THREAD_PERCENTAGE` environment variable | ||
|
|
||
| For example, the following will give all users in the group named `"users"` access to three GPUs in their Jupyterlab | ||
| container. On the first one (id = 0) only 1GB of memory is available, on the second (id = 1) only 5GB, and on the third | ||
| (id = 2) only 10GB. Additionally, the container will be able to use 10% of available threads on the GPUs. | ||
|
|
||
| ```shell | ||
| export JUPYTERHUB_RESOURCE_LIMITS=' | ||
| [{ | ||
| "type": "group", | ||
| "name": "users", | ||
| "limits": { | ||
| "gpu_ids": ["0", "1", "2"], | ||
| "gpu_count": 3, | ||
| "gpu_device_mem_limit": "0=1G,1=5G,2=10G", | ||
| "gpu_active_thread_percentage": "10" | ||
| } | ||
| }] | ||
| ' | ||
| ``` | ||
|
|
||
| Note that leaving any of these limits unset will default to allowing the user full access to the given resource. | ||
|
|
||
| - Update `CustomDockerSpawner` to make pre spawn hooks and resource limits more configurable | ||
|
|
||
| Introduce `pre_spawn_hooks` and `resource_limit_callbacks` attributes to the `CustomDockerSpawner` class which | ||
| can be used to further customize the `CustomDockerSpawner` from optional components. This gives us a way to | ||
| add additional functionality without having to directly modify existing functions which may be overwritten by the | ||
| user when they configure the spawner in `JUPYTERHUB_CONFIG_OVERRIDE`. | ||
|
|
||
| This also introduces the `JUPYTERHUB_CONFIG_OVERRIDE_INTERNAL` variable which is identical to the | ||
| `JUPYTERHUB_CONFIG_OVERRIDE` variable except that it is intended to only be set by other components (not be the | ||
|
||
| user in the local environment file). This allows components to customize Jupyterhub deployments without interfering | ||
| with custom settings created by the user. Note that `JUPYTERHUB_CONFIG_OVERRIDE` has precedence over | ||
| `JUPYTERHUB_CONFIG_OVERRIDE_INTERNAL`. | ||
mishaschwartz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Fixes | ||
|
|
||
| - Update GPU limit examples to show expected syntax | ||
|
|
||
| Fixes some examples that showed that `gpu_ids` could be given as integers if they were meant to be indexes. However, | ||
| due to limitation of docker they must be strings. This modifies examples so that it is clear that strings must be | ||
| used and also updates the code to ensure that string values are only ever passed to docker when spawning a new | ||
| jupyterlab server. | ||
|
|
||
| [2.22.0](https://github.com/bird-house/birdhouse-deploy/tree/2.22.0) (2026-02-09) | ||
| ------------------------------------------------------------------------------------------------------------------ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| readonly CUDA_MPS_PINNED_DEVICE_MEM_LIMIT | ||
| readonly CUDA_MPS_ACTIVE_THREAD_PERCENTAGE | ||
mishaschwartz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| services: | ||
| jupyterhub: | ||
| environment: | ||
| - NVIDIA_MPS_PROFILE_SCRIPT=${COMPOSE_DIR}/optional-components/nvidia-multi-process-service/02-readonly-cuda-vars.sh |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| export NVIDIA_MULTIPROCESS_SERVICE_DOCKER=debian | ||
| export NVIDIA_MULTIPROCESS_SERVICE_VERSION=bookworm-slim | ||
| export NVIDIA_MULTIPROCESS_SERVICE_IMAGE='${NVIDIA_MULTIPROCESS_SERVICE_DOCKER}:${NVIDIA_MULTIPROCESS_SERVICE_VERSION}' | ||
|
|
||
| export DELAYED_EVAL=" | ||
| $DELAYED_EVAL | ||
| NVIDIA_MULTIPROCESS_SERVICE_IMAGE | ||
| " | ||
|
|
||
| export JUPYTERHUB_CONFIG_OVERRIDE_INTERNAL=" | ||
| ${JUPYTERHUB_CONFIG_OVERRIDE_INTERNAL} | ||
|
|
||
| def _gpu_device_mem_limit(spawner, value): | ||
mishaschwartz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ''' | ||
| Set memory limits for GPUs allocated to this user. | ||
|
|
||
| See: https://docs.nvidia.com/deploy/mps/appendix-tools-and-interface-reference.html#cuda-mps-pinned-device-mem-limit | ||
| ''' | ||
| spawner.environment['CUDA_MPS_PINNED_DEVICE_MEM_LIMIT'] = value | ||
|
|
||
| def _gpu_active_thread_percentage(spawner, value): | ||
| ''' | ||
| Set active thread percentage for GPUs allocated to this user | ||
|
|
||
| See: https://docs.nvidia.com/deploy/mps/appendix-tools-and-interface-reference.html#cuda-mps-active-thread-percentage | ||
| ''' | ||
| spawner.environment['CUDA_MPS_ACTIVE_THREAD_PERCENTAGE'] = str(value) | ||
|
|
||
| c.CustomDockerSpawner.resource_limit_callbacks.update({ | ||
| 'gpu_device_mem_limit': _gpu_device_mem_limit, | ||
| 'gpu_active_thread_percentage': _gpu_active_thread_percentage, | ||
| }) | ||
|
|
||
| def _gpu_set_mps_configs(spawner): | ||
| ''' | ||
| Set configurations so this container uses the multi-process service running in the container named mps | ||
|
|
||
| See: https://gitlab.com/nvidia/container-images/samples/-/blob/master/mps/docker-compose.yml | ||
| ''' | ||
| spawner.extra_host_config['ipc_mode'] = 'container:mps' | ||
| spawner.volumes['nvidia_mps'] = '/tmp/nvidia-mps' | ||
|
|
||
| c.CustomDockerSpawner.pre_spawn_hooks.append(_gpu_set_mps_configs) | ||
|
|
||
| # This sets the variables as readonly so that users can't unset/update the environment variables | ||
| # that set these limits in the jupyterlab docker container. | ||
| c.CustomDockerSpawner.volumes.update({ | ||
| os.environ['NVIDIA_MPS_PROFILE_SCRIPT']: '/etc/profile.d/02-readonly-cuda-vars.sh' | ||
| }) | ||
| " | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| services: | ||
| mps: | ||
| image: ${NVIDIA_MULTIPROCESS_SERVICE_IMAGE} | ||
| container_name: mps | ||
| restart: always | ||
| ipc: shareable | ||
| volumes: | ||
| - nvidia_mps:/tmp/nvidia-mps | ||
| init: true | ||
| command: ["nvidia-cuda-mps-control", "-f"] | ||
| deploy: | ||
| resources: | ||
| reservations: | ||
| devices: | ||
| - driver: nvidia | ||
| count: all | ||
| capabilities: [gpu] | ||
|
Comment on lines
+15
to
+17
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be configurable as well? For example, some GPUs reserved for Jupyter and others reserved for other operations (eg: Weaver Workers) ? Is it better to have all GPU-enabled operations connected to this MPS regardless of the way they are used, and have limited
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm going to say that it's better to make everything go through the MPS and then divide up the GPUs when they're assigned to containers (jupyterlab or weaver workers). The only exception I can think of is if a user has a subset of GPUs that they want to use for birdhouse and another set that they want to use for something else entirely on the same machine. I guess I can make this configurable but if a user is doing something other than the default they have to really really know what they're doing so they don't break things.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually you know what... the problem here is actually how docker compose configures this. The The only way to to this would be to create an other optional component with a mps:
deploy:
resources:
reservations:
devices: !override
- capabilities: [gpu]
driver: nvidia
device_ids: ["0", "1"]If you want to only allow a subset of GPUs. For now I'll document this with a comment but actually configuring this would require a whole other component.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK to address in another PR with documentation for the time being. Indeed, the I guess that also raises another question. How are other non-Jupyter/DockerSpawner services supposed to map with
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah that's going to be the subject of a future PR. But to summarize here, you'd need to (for all containers that access GPUs):
There's a reference to this in the PR but I find that this example docker compose project outlines the setup nicely: https://gitlab.com/nvidia/container-images/samples/-/blob/master/mps/docker-compose.yml (note that the syntax on that file is slightly outdated but it gives the right idea) |
||
|
|
||
| volumes: | ||
| nvidia_mps: | ||
| driver: local | ||
| driver_opts: | ||
| type: tmpfs | ||
| device: tmpfs | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| if [ "$(nvidia-smi --query-gpu=compute_mode --format=csv,noheader | grep -vc 'Exclusive_Process')" -ne 0 ]; then | ||
| log WARN "Nvidia GPUs with compute mode set to something other than EXCLUSIVE_PROCESS detected. We recommend you set the compute mode to EXCLUSIVE_PROCESS when enabling nvidia's Multi Process Service (MPS)." | ||
| fi | ||
|
Comment on lines
+6
to
+8
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that a hard-requirement for MPS to work, or some other efficiency reason? If it is a hard-requirement, maybe it should not Activating this component without a GPU/Nvidia-SMI will cause a command error. That is fine since it won't work anyway, but maybe the error should be more gracefully handled?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not a hard requirement which is why it's only a warning. Again, the documentation is awful so I'm only 90% sure this is the reason:
There are valid use-cases for running MPS on a GPU without
True, I'll add better error handling for this
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK thanks for the details. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the
<gpu-id>=<thread-count>,...variant works for this also?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell no... the Nvidia documentation is pretty horrendous but all the documentation and examples I've seen only shows that you can specify limits for specific GPUs for the memory limit, not the active thread percentage.