Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 10 additions & 14 deletions docs/manual/olares/settings/gpu-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ This guide helps you understand and configure GPU allocation modes to maximize h
Olares supports **only Nvidia GPUs** of **Turing architecture or later** (Turing, Ampere, Ada Lovelace, and Blackwell).

- Quick check: GTX/RTX **16 series and newer** consumer cards are supported.
- For other models, cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus).
- Other models: Cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus).
- Unknown model: Run `lspci | grep -i nvidia` to query the GPU architecture code and determine compatibility.
:::
Expand All @@ -28,26 +27,23 @@ Even if your GPU architecture is supported, **low VRAM capacity may cause AI app

Olares supports three GPU allocation modes. Choosing the right mode helps optimize performance based on your needs.

### Time Slicing
### App Exclusive

In this mode, the GPU's processing power is shared among multiple applications.
In this mode, the GPU’s full compute capacity and VRAM are allocated to a single application to ensure the maximized performance.

* Acts as a default resource pool. Any application not explicitly assigned to a specific GPU will automatically use a time-slicing GPU if available.
### Memory Slicing

* Suitable for General-purpose use and running multiple lightweight applications.
In this mode, GPU VRAM is allocated to multiple applications by specified VRAM quotas:

### App Exclusive
- Applications with assigned VRAM can run concurrently on the GPU.
- The sum of all assigned VRAMs must not exceed the GPU’s physical VRAM.

In this mode, the entire GPU processing power and memory is dedicated to a single application.
### Time Slicing

* Best for intensive, performance-critical applications like AI-generated imagery or high-performance gaming servers.
* Large memory demands may limit availability for other tasks.

### Memory Slicing
In this mode, GPU memory (VRAM) is partitioned into fixed, dedicated amounts for specific applications.
In this mode, any number of applications can be bound to the same GPU:

* Ideal for running multiple GPU-intensive applications simultaneously, each with guaranteed VRAM allocation.
* Prevents memory conflicts between applications running on the same GPU.
- At any instant, only one application fully occupies the GPU’s compute and VRAM.
- VRAM contents of other applications are temporarily swapped out to system memory.

## View GPU status

Expand Down
23 changes: 9 additions & 14 deletions docs/zh/manual/olares/settings/gpu-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,26 +28,21 @@ Olares 仅支持 **NVIDIA 显卡**,且要求架构为 **Turing 或更新**(T

Olares 提供三种分配方式,可按场景灵活选择。

### 时间分片模式

在此模式下,GPU 的处理能力将在多个应用之间共享。

- 该模式下,GPU 提供默认的显存资源池。未被分配独占 GPU 或专有显存的应用将自动使用时间分片模式下的 GPU(如可用)。
- 适合通用型任务以及同时运行多个轻量级应用。

### 应用独占模式

在此模式下,整个 GPU 的计算能力和显存将专用于单个应用。

- 最适合高性能、资源密集型应用,如 AI 图像生成或高性能游戏服务器。
- 大内存占用可能会限制其他任务的运行。
在此模式下,单张 GPU 的算力和显存将分配给一个应用,以保证最佳性能。

### 显存分片模式

在此模式下,GPU 显存(VRAM)被划分为固定配额,分配给指定应用。
在此模式下,GPU 显存可按指定显存分配给多个应用。
- 所有获得显存的应用可同时使用 GPU。
- 所分配显存之和不得超过总物理显存。

### 时间分片模式

- 适合同时运行多个显卡密集型应用(如多个 AI 模型),每个应用都有独立显存配额。
- 可避免多个应用运行在同一 GPU 上时的内存冲突。
在此模式下,任意数量应用可绑定至同一 GPU:
- 任一时刻仅有一个应用完全占用 GPU 算力和显存。
- 此时其他应用的显存内容会暂时换出至系统内存。

## 查看显卡状态

Expand Down