Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 44 additions & 29 deletions docs/manual/olares/settings/gpu-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ Olares allows you to harness the full power of your GPUs to accelerate demanding
This guide helps you understand and configure GPU allocation modes to maximize hardware performance.

::: tip GPU support
Olares supports **only Nvidia GPUs** of **Turing architecture or later** (Turing, Ampere, Ada Lovelace, and Blackwell).
Olares supports **only Nvidia GPUs** of **Turing architecture or later** (Turing, Ampere, Ada Lovelace, and Blackwell).

- Quick check: GTX/RTX **16 series and newer** consumer cards are supported.
- For other models, cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus).
- Other models: Cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus).
- Unknown model: Run `lspci | grep -i nvidia` to query the GPU architecture code and determine compatibility.
:::
:::

:::warning AI Performance
Even if your GPU architecture is supported, **low VRAM capacity may cause AI applications to fail**. Ensure your GPU has enough memory for your workloads.
Expand All @@ -28,26 +28,36 @@ Even if your GPU architecture is supported, **low VRAM capacity may cause AI app

Olares supports three GPU allocation modes. Choosing the right mode helps optimize performance based on your needs.

### Time Slicing
### Time Slicing

In this mode, the GPU's processing power is shared among multiple applications.
In this mode, a GPU can be bound to multiple applications and rotates execution in time slices.

* Acts as a default resource pool. Any application not explicitly assigned to a specific GPU will automatically use a time-slicing GPU if available.
* At any instant, only one application uses all available compute and VRAM of the GPU.
* Other apps enter a wait queue; Their VRAM contents (e.g., CUDA context, etc.) may be temporarily swapped out to system memory.

* Suitable for General-purpose use and running multiple lightweight applications.
:::info Default GPU allocation
By default, GPUs run in time-slicing mode. Applications without allocated GPU resources automatically join the time-sliced GPU queue. If no time-sliced GPU is available, the application pauses after a startup timeout. In this case, you need to allocate a GPU (for example, set a GPU to time-slicing mode, or assign a VRAM quota to the application), then manually resume the application.
:::

### App Exclusive

In this mode, the entire GPU processing power and memory is dedicated to a single application.
In this mode, the entire GPU is allocated to a single application.

* Best for intensive, performance-critical applications like AI-generated imagery or high-performance gaming servers.
* Large memory demands may limit availability for other tasks.
* During execution, the app can use all compute and VRAM of the bound GPU.
* No cross-app contention or scheduling overhead so that best performance is guaranteed.

### Memory Slicing
In this mode, GPU memory (VRAM) is partitioned into fixed, dedicated amounts for specific applications.
In this mode, VRAM of the GPU is partitioned into fixed quotas for multiple designated applications.

* Ideal for running multiple GPU-intensive applications simultaneously, each with guaranteed VRAM allocation.
* Prevents memory conflicts between applications running on the same GPU.
* Users need to manually set a quota for each app.
* The sum of quotas must not exceed physical VRAM of the bound GPU. Oversubscription is not supported.
* Apps with quota assigned can run concurrently, each limited to its own quota.

:::tip Multi-GPU allocation
- All three allcation modes support assigning multiple GPUs to the same application. Olares only assigns multiple GPUs to the application’s container without fusing VRAM or compute in any way. Whether multi-GPU is utilized depends on the application/framework itself.

- In multi-node environments, you can't assign multiple GPUs across nodes to the same application simultaneously.
:::

## View GPU status

Expand All @@ -56,8 +66,10 @@ To view your GPU status:
1. Navigate to **Settings** > **GPU**. The GPU list shows each GPU’s model, associated node, total VRAM, and current GPU mode.
2. Click on a specific GPU to visit its details.

![GPU overview](/images/manual/olares/gpu-overview.png#bordered)

::: tip Note
If your Olares only has one GPU, navigating to the GPU section will take you directly to the GPU details page. If you have multiple GPUs, you will see a list first.
If your Olares only has one GPU, navigating to the GPU section will take you directly to the GPU details page.
:::

## Configure GPU mode
Expand All @@ -70,26 +82,29 @@ On the **GPU details** page, select your desired mode from the **GPU mode** drop

![Time slicing](/images/manual/olares/gpu-time-slicing.png#bordered)

:::tip Note
No manual pinning is required if you only have one GPU in your cluster.
:::
:::tip Note
No manual binding is required if you only have one GPU in your cluster.
:::

* **App Exclusive**
1. Select this mode from the GPU mode dropdown.
2. In the **Select exclusive app** dropbox, choose your target application.
3. Click **Confirm**.
![App exclusive](/images/manual/olares/gpu-app-exclusive.png#bordered)

* **Memory Slicing**
1. Select this mode from the dropdown.
2. In the **Allocate VRAM** section, click **Add an application**.
3. Select your target application and assign it a specific amount of VRAM (in GB).
4. Repeat for other applications and click **Confirm**.
![VRAM slicing](/images/manual/olares/gpu-memory-slicing.png#bordered)

::: tip Note
You can't assign a VRAM that's larger than the total VRAM.
:::
![App exclusive](/images/manual/olares/gpu-app-exclusive.png#bordered)

* **Memory Slicing**
1. Select this mode from the dropdown.
2. In the **Allocate VRAM** section, click **Add an application**.
3. Select your target application and assign it a specific amount of VRAM in GB.
4. Repeat for other applications and click **Confirm**.
![VRAM slicing](/images/manual/olares/gpu-memory-slicing.png#bordered)

:::tip Unbinding
- After binding an GPU or its VRAM to an application, you can manually unbind it under the corresponding GPU mode to release GPU resources.

- When you switch a GPU’s allocation mode, all applications allocated under that mode are unbound, and the application containers will restart.
:::


## Learn more
- [Monitor GPU usage in Olares](../resources-usage.md)
Binary file added docs/public/images/manual/olares/gpu-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
40 changes: 26 additions & 14 deletions docs/zh/manual/olares/settings/gpu-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,30 +30,40 @@ Olares 提供三种分配方式,可按场景灵活选择。

### 时间分片模式

在此模式下,GPU 的处理能力将在多个应用之间共享。

- 该模式下,GPU 提供默认的显存资源池。未被分配独占 GPU 或专有显存的应用将自动使用时间分片模式下的 GPU(如可用)。
- 适合通用型任务以及同时运行多个轻量级应用。
在此模式下,单张显卡按时间分片分配给多个应用。
- 任一时刻仅一个应用占用全部算力与可用显存。
- 其余应用进入等待队列,其显存内容(如 CUDA 上下文等)可被临时换出至系统内存。

::: tip 默认显卡分配
显卡默认处于时间分片模式。未被分配 GPU 资源的应用将自动加入时间分片显卡队列。若系统无可用时间分片显卡,应用会在启动超时后被暂停。此时,需先为应用分配显卡(如设置显卡为时间分片模式,或为应用分配显存)后,可手动恢复应用运行。
:::

### 应用独占模式

在此模式下,整个 GPU 的计算能力和显存将专用于单个应用
在此模式下,每张显卡的计算能力和显存将分配至单个应用

- 最适合高性能、资源密集型应用,如 AI 图像生成或高性能游戏服务器
- 大内存占用可能会限制其他任务的运行
- 应用在运行时可使用显卡全部的算力和显存
- 在这个模式下运行的应用会获得最佳性能

### 显存分片模式
在此模式下,每张显卡的显存被划分为固定配额,分配给多个指定应用。

在此模式下,GPU 显存(VRAM)被划分为固定配额,分配给指定应用。
- 需为每个应用手动设定配额。
- 各配额之和不得超过对应显卡的物理显存。(暂不支持超订阅)
- 获配额的应用可并行运行,且仅能使用自身配额。

- 适合同时运行多个显卡密集型应用(如多个 AI 模型),每个应用都有独立显存配额。
- 可避免多个应用运行在同一 GPU 上时的内存冲突。
:::tip 多显卡分配
- 三种模式均支持为同一应用分配多张显卡。Olares 仅将多张显卡分配到应用所在的容器,不做显存/算力的融合;能否利用多卡取决于应用/框架本身。
- 在多节点环境中,同一应用不可跨节点同时分配多张显卡。
:::

## 查看显卡状态

1. 进入 **设置 > GPU**。GPU 列表显示每个显卡的型号、所在节点、总显存及当前分配模式。
2. 点击单个显卡以进入其详情页。

![GPU 概览](/images/zh/manual/olares/gpu-overview.png#bordered)

::: tip 注意
如果你的 Olares 集群中只有一块 GPU,进入 GPU 页面将直接跳转至详情页;若有多块 GPU,则会显示 GPU 列表。
:::
Expand All @@ -71,15 +81,17 @@ Olares 提供三种分配方式,可按场景灵活选择。
2. 在**选择独占应用**下拉框中选择目标应用。
3. 点击**确认**。
![独占](/images/zh/manual/olares/gpu-app-exclusive.png#bordered)
- **显存分片**:
* **显存分片**:
1. 在下拉菜单中选择该模式。
2. 在**分配显存**窗口,点击 **+ 添加应用**。
3. 选择目标应用,并指定分配给该应用的显存大小(以 GB 为单位)。
4. 如需为其他应用分配显存,可重复以上操作,然后点击**确认**。
![显存分片](/images/zh/manual/olares/gpu-memory-slicing.png#bordered)
::: tip 注意
分配的显存必须小于显卡总显存。
:::

:::tip 解除绑定
- 绑定应用后,如需释放显卡资源,可在相应的显卡模式下手动执行解绑操作。
- 切换某张显卡的分配模式时,显卡在该模式下分配的所有应用将被解除绑定,同时应用容器会重启。
:::

## 了解更多
- [监控 Olares 中的显卡使用情况](../resources-usage.md)