Skip to content

Commit 147bf55

Browse files
committed
Update doc
1 parent ceba181 commit 147bf55

File tree

2 files changed

+32
-25
lines changed

2 files changed

+32
-25
lines changed

docs/en/docs/dynamic-gpu-partitioning/getting-started-mig.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,8 @@ ConfigMap by editing the field `nos-gpu-partitioner.knownMigGeometries` of the
5858
## Create pods requesting MIG resources
5959

6060
!!! tip
61-
There is no need to manually create and manage MIG configurations.
62-
Simply submit your Pods to the cluster and the requested MIG devices are automatically provisioned.
61+
There is no need to manually create and manage MIG configurations.
62+
You can simply submit your Pods to the cluster and the requested MIG devices are automatically provisioned.
6363

6464
You can make your pods request slices of GPU by specifying MIG devices in their containers requests:
6565

docs/en/docs/dynamic-gpu-partitioning/partitioning-modes-comparison.md

Lines changed: 30 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ cluster according to your needs and available hardware.
77
| Partitioning mode | Supported by `nos` | Workload isolation level | Pros | Cons |
88
|----------------------------|:-------------------|:-------------------------|-----------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
99
| Multi-instance GPU (MIG) || Best | <ul><li>Processes are executed in parallel</li><li>Full isolation (dedicated memory and compute resources)</li></ul> | <ul><li>Supported by fewer GPU models (only Ampere or more recent architectures)</li><li>Coarse-grained control over memory and compute resources</li></ul> |
10-
| Multi-process server (MPS) || Good | <ul><li>Processes are executed parallel</li><li>Fine-grained control over memory and compute resources allocation</li></ul> | <ul><li>No error isolation and memory protection</li></ul> |
10+
| Multi-process server (MPS) || Medium | <ul><li>Processes are executed parallel</li><li>Fine-grained control over memory and compute resources allocation</li></ul> | <ul><li>No error isolation and memory protection</li></ul> |
1111
| Time-slicing || None | <ul><li>Processes are executed concurrently</li><li>Supported by older GPU architectures (Pascal or newer)</li></ul> | <ul><li>No resource limits</li><li>No memory isolation</li><li>Lower performance due to context-switching overhead</li></ul> |
1212

1313
## Multi-instance GPU (MIG)
@@ -16,17 +16,16 @@ Multi-instance GPU (MIG) is a technology available on NVIDIA Ampere or more rece
1616
partition a GPU into separate GPU instances for CUDA applications, each fully isolated with its own high-bandwidth
1717
memory, cache, and compute cores.
1818

19-
The isolated GPU slices created through MIG are called MIG devices, and they are named according to the following
20-
naming convention: `<gpu-instance>g.<gpu-memory>gb`, where the GPU instance part corresponds to the computing
21-
resources of the device, while the GPU Memory indicates its GB of memory. Example: `2g.20gb`
19+
The isolated GPU slices are called MIG devices, and they are named adopting a format that indicates the compute and
20+
memory resources of the device. For example, 2g.20gb corresponds to a GPU slice with 20 GB of memory.
2221

23-
Each GPU model allows only a pre-defined set of MIG Geometries (e.g. set of MIG devices with the respective max quantity),
24-
which limits the granularity of the partitioning. Moreover, the MIG devices allowed by a certain geometry must be
25-
created in a specific order, further limiting the flexibility of the partitioning.
22+
MIG does not allow to create GPU slices of custom sizes and quantity, as each GPU model only supports a
23+
[specific set of MIG profiles](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-profiles).
24+
This reduces the degree of granularity with which you can partition the GPUs.
25+
Additionally, the MIG devices must be created respecting certain placement rules, which further limits flexibility of use.
2626

27-
Even though MIG partitioning is less flexible, it is important to point out that it is the technology that offers the
28-
highest level of isolation among the created GPU slices, which are equivalent to independent and fully-isolated
29-
different GPUs.
27+
MIG is the GPU sharing approach that offers the highest level of isolation among processes.
28+
However, it lacks in flexibility and it is compatible only with few GPU architectures (Ampere and Hopper).
3029

3130
You can find out more on how MIG technology works in the official
3231
[NVIDIA MIG User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
@@ -46,23 +45,31 @@ specify arbitrary limits on both the amount of allocatable memory and the availa
4645
takes advantage of this feature for exposing to Kubernetes GPU resources with an arbitrary amount of allocatable
4746
memory defined by the user.
4847

49-
Additionally, MPS eliminates the context-switching overhead by executing processes in parallel through
50-
*spatial sharing*, resulting in higher workloads performance.
48+
Compared to time-slicing, MPS eliminates the overhead of context-switching by running processes in parallel
49+
through spatial sharing, and therefore leads to better compute performance. Moreover, MPS provides each
50+
client with its own GPU memory address space. This allows to enforce memory limits on the processes overcoming
51+
the limitations of time-slicing sharing.
5152

52-
It is however important to point out that, even though allocatable memory and compute resources limits are enforced,
53-
processes sharing a GPU through MPS are not fully isolated from each other. For instance, MPS does not provide error
54-
isolation and memory protection, which means that a process can crash and cause the entire GPU to be reset (this
55-
can however often been avoided by gracefully handling CUDA errors and SIGTERM signals).
53+
It is however important to point out that processes sharing a GPU through MPS are not fully isolated from each other.
54+
Indeed, even though MPS allows to limit clients' compute and memory resources, it does not provide error isolation and
55+
memory protection. This means that a client process can crash and cause the entire GPU to reset,
56+
impacting all other processes running on the GPU. However, this issue can often be addressed by properly handling CUDA
57+
errors and SIGTERM signals.
5658

5759
## Time-slicing
5860

5961
Time-slicing consists of oversubscribing a GPU leveraging its time-slicing scheduler, which executes multiple CUDA
60-
processes concurrently through *temporal sharing*. This means that the GPU shares its compute resources among the
61-
different processes in a fair-sharing manner by switching between them at regular intervals of time. This brings
62-
the cost of context-switching overhead, which translates into jitter and higher latency that affects the workloads.
62+
processes concurrently through *temporal sharing*.
6363

64-
Time-slicing also does not provide any level of memory isolation between the different processes sharing a GPU, nor
65-
any memory allocation limits, which can lead to frequent out-of-memory (OOM) errors.
64+
This means that the GPU shares its compute resources among the different processes in a fair-sharing manner
65+
by switching between processes at regular intervals of time. This generates a computing time overhead related to
66+
the continuous context switching, which translates into jitter and higher latency.
6667

67-
Given the drawbacks above the availability of more robust technologies such as MIG and MPS, at the moment we
68-
decided to not support time-slicing partitioning in `nos`.
68+
Time-slicing is supported by basically every GPU architecture and is the simplest solution for sharing a GPU in
69+
a Kubernetes cluster. However, constant switching among processes creates a computation time overhead.
70+
Also, time-slicing does not provide any level of memory isolation among the processes sharing a GPU, nor any memory
71+
allocation limits, which can lead to frequent Out-Of-Memory (OOM) errors.
72+
73+
!!! info
74+
Given the drawbacks above the availability of more robust technologies such as MIG and MPS, at the moment we
75+
decided to not support time-slicing GPU sharing in `nos`.

0 commit comments

Comments
 (0)