Update doc

Telemaco019 · Telemaco019 · commit 147bf5505732 · 2023-01-25T17:38:46.000+01:00
diff --git a/docs/en/docs/dynamic-gpu-partitioning/getting-started-mig.md b/docs/en/docs/dynamic-gpu-partitioning/getting-started-mig.md
@@ -58,8 +58,8 @@ ConfigMap by editing the field `nos-gpu-partitioner.knownMigGeometries` of the
 ## Create pods requesting MIG resources
 
 !!! tip
-    There is no need to manually create and manage MIG configurations. 
-    Simply submit your Pods to the cluster and the requested MIG devices are automatically provisioned.    
+    There is no need to manually create and manage MIG configurations.
+    You can simply submit your Pods to the cluster and the requested MIG devices are automatically provisioned.
 
 You can make your pods request slices of GPU by specifying MIG devices in their containers requests:
 
diff --git a/docs/en/docs/dynamic-gpu-partitioning/partitioning-modes-comparison.md b/docs/en/docs/dynamic-gpu-partitioning/partitioning-modes-comparison.md
@@ -7,7 +7,7 @@ cluster according to your needs and available hardware.
 | Partitioning mode          | Supported by `nos` | Workload isolation level | Pros                                                                                                                        | Cons                                                                                                                                                        |
 |----------------------------|:-------------------|:-------------------------|-----------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | Multi-instance GPU (MIG)   | ✅                  | Best                     | <ul><li>Processes are executed in parallel</li><li>Full isolation (dedicated memory and compute resources)</li></ul>        | <ul><li>Supported by fewer GPU models (only Ampere or more recent architectures)</li><li>Coarse-grained control over memory and compute resources</li></ul> |
-| Multi-process server (MPS) | ✅                  | Good                     | <ul><li>Processes are executed parallel</li><li>Fine-grained control over memory and compute resources allocation</li></ul> | <ul><li>No error isolation and memory protection</li></ul>                                                                                                  |
+| Multi-process server (MPS) | ✅                  | Medium                     | <ul><li>Processes are executed parallel</li><li>Fine-grained control over memory and compute resources allocation</li></ul> | <ul><li>No error isolation and memory protection</li></ul>                                                                                                  |
 | Time-slicing               | ❌                  | None                     | <ul><li>Processes are executed concurrently</li><li>Supported by older GPU architectures (Pascal or newer)</li></ul>        | <ul><li>No resource limits</li><li>No memory isolation</li><li>Lower performance due to context-switching overhead</li></ul>                                |
 
 ## Multi-instance GPU (MIG)
@@ -16,17 +16,16 @@ Multi-instance GPU (MIG) is a technology available on NVIDIA Ampere or more rece
 partition a GPU into separate GPU instances for CUDA applications, each fully isolated with its own high-bandwidth
 memory, cache, and compute cores.
 
-The isolated GPU slices created through MIG are called MIG devices, and they are named according to the following
-naming convention: `<gpu-instance>g.<gpu-memory>gb`, where the GPU instance part corresponds to the computing
-resources of the device, while the GPU Memory indicates its GB of memory. Example: `2g.20gb`
+The isolated GPU slices are called MIG devices, and they are named adopting a format that indicates the compute and
+memory resources of the device. For example, 2g.20gb corresponds to a GPU slice with 20 GB of memory.
 
-Each GPU model allows only a pre-defined set of MIG Geometries (e.g. set of MIG devices with the respective max quantity),
-which limits the granularity of the partitioning. Moreover, the MIG devices allowed by a certain geometry must be
-created in a specific order, further limiting the flexibility of the partitioning.
+MIG does not allow to create GPU slices of custom sizes and quantity, as each GPU model only supports a
+[specific set of MIG profiles](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-profiles).
+This reduces the degree of granularity with which you can partition the GPUs.
+Additionally, the MIG devices must be created respecting certain placement rules, which further limits flexibility of use.
 
-Even though MIG partitioning is less flexible, it is important to point out that it is the technology that offers the
-highest level of isolation among the created GPU slices, which are equivalent to independent and fully-isolated
-different GPUs.
+MIG is the GPU sharing approach that offers the highest level of isolation among processes.
+However, it lacks in flexibility and it is compatible only with few GPU architectures (Ampere and Hopper).
 
 You can find out more on how MIG technology works in the official
 [NVIDIA MIG User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
@@ -46,23 +45,31 @@ specify arbitrary limits on both the amount of allocatable memory and the availa
 takes advantage of this feature for exposing to Kubernetes GPU resources with an arbitrary amount of allocatable
 memory defined by the user.
 
-Additionally, MPS eliminates the context-switching overhead by executing processes in parallel through 
-*spatial sharing*, resulting in higher workloads performance.
+Compared to time-slicing, MPS eliminates the overhead of context-switching by running processes in parallel
+through spatial sharing, and therefore leads to better compute performance. Moreover, MPS provides each
+client with its own GPU memory address space. This allows to enforce memory limits on the processes overcoming
+the limitations of time-slicing sharing.
 
-It is however important to point out that, even though allocatable memory and compute resources limits are enforced,
-processes sharing a GPU through MPS are not fully isolated from each other. For instance, MPS does not provide error
-isolation and memory protection, which means that a process can crash and cause the entire GPU to be reset (this
-can however often been avoided by gracefully handling CUDA errors and SIGTERM signals).
+It is however important to point out that processes sharing a GPU through MPS are not fully isolated from each other.
+Indeed, even though MPS allows to limit clients' compute and memory resources, it does not provide error isolation and
+memory protection. This means that a client process can crash and cause the entire GPU to reset,
+impacting all other processes running on the GPU. However, this issue can often be addressed by properly handling CUDA
+errors and SIGTERM signals.
 
 ## Time-slicing
 
 Time-slicing consists of oversubscribing a GPU leveraging its time-slicing scheduler, which executes multiple CUDA
-processes concurrently through *temporal sharing*. This means that the GPU shares its compute resources among the
-different processes in a fair-sharing manner by switching between them at regular intervals of time. This brings
-the cost of context-switching overhead, which translates into jitter and higher latency that affects the workloads.
+processes concurrently through *temporal sharing*.
 
-Time-slicing also does not provide any level of memory isolation between the different processes sharing a GPU, nor
-any memory allocation limits, which can lead to frequent out-of-memory (OOM) errors.
+This means that the GPU shares its compute resources among the different processes in a fair-sharing manner
+by switching between processes at regular intervals of time. This generates a computing time overhead related to
+the continuous context switching, which translates into jitter and higher latency.
 
-Given the drawbacks above the availability of more robust technologies such as MIG and MPS, at the moment we
-decided to not support time-slicing partitioning in `nos`.
+Time-slicing is supported by basically every GPU architecture and is the simplest solution for sharing a GPU in
+a Kubernetes cluster. However, constant switching among processes creates a computation time overhead.
+Also, time-slicing does not provide any level of memory isolation among the processes sharing a GPU, nor any memory
+allocation limits, which can lead to frequent Out-Of-Memory (OOM) errors.
+
+!!! info
+    Given the drawbacks above the availability of more robust technologies such as MIG and MPS, at the moment we
+    decided to not support time-slicing GPU sharing in `nos`.