Skip to content

Commit 817116d

Browse files
committed
grammar
1 parent 6bd187e commit 817116d

1 file changed

Lines changed: 8 additions & 8 deletions

File tree

  • playbooks/supplemental/clustering-rccl

playbooks/supplemental/clustering-rccl/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ SPDX-License-Identifier: MIT
1313

1414
## Overview
1515

16-
Your Ryzen™ AI Halo is already capable of running large language models locally. Clustering takes this further by combining the GPU memory of multiple systems over a local network, giving you access to even larger models with stronger reasoning, better code generation, and deeper multilingual understanding, all entirely on your own hardware.
16+
Your Ryzen™ AI Halo is already capable of running large language models locally. Clustering takes this further by combining the GPU memory of multiple systems over a local network, giving you access to even larger models with stronger reasoning, better code generation, and deeper multilingual understanding, all entirely on your own hardware.
1717

18-
This playbook teaches you how to cluster two Ryzen™ AI Halo systems using RCCL (ROCm Communication Collectives Library) with vLLM and run Qwen3.5-397B, a 397B parameter model, across both machines with ROCm acceleration.
18+
This playbook teaches you how to cluster two Ryzen™ AI Halo systems using RCCL (ROCm Communication Collectives Library) with vLLM and run Qwen3.5-397B, a 397B parameter model, across both machines with ROCm acceleration.
1919

2020
## What You'll Learn
2121

22-
- How to extend VRAM allocation on Ryzen™ AI Halo systems
22+
- How to extend VRAM allocation on Ryzen™ AI Halo systems
2323
- Launching vLLM with ROCm support
24-
- Configuring RCCL for multi-node tensor-parallel inference across two Ryzen™ AI Halo systems
24+
- Configuring RCCL for multi-node tensor-parallel inference across two Ryzen™ AI Halo systems
2525
- Running a 397B parameter model across two networked Ryzen™ AI Halo systems
2626

2727
## Prerequisites
@@ -92,7 +92,7 @@ You should see a speed of `10000Mb/s`:
9292

9393
On Linux, ROCm utilizes a shared system memory pool, and this pool is configured by default to half the system memory.
9494

95-
This amount can be increased by changing the kernel's Translation Table Manager (TTM) page setting, with the following instructions. AMD recommends setting the minimum dedicated VRAM in the BIOS (0.5GB)
95+
This amount can be increased by changing the kernel's Translation Table Manager (TTM) page setting, with the following instructions. AMD recommends setting the minimum dedicated VRAM in the BIOS (0.5 GB).
9696

9797
* Install the pipx utility and add the path for pipx installed wheels into the system search path.
9898

@@ -101,7 +101,7 @@ This amount can be increased by changing the kernel's Translation Table Manager
101101
pipx ensurepath
102102
```
103103

104-
* Install the amd-debug-tools wheel from PyPi.
104+
* Install the amd-debug-tools wheel from PyPI.
105105
```bash
106106
pipx install amd-debug-tools
107107
```
@@ -224,12 +224,12 @@ To connect Open WebUI to your vLLM endpoint:
224224
225225
![Open WebUI connection settings for the vLLM endpoint](assets/openwebui-connection.png)
226226

227-
Once connected, select the model from the model dropdown in Open WebUI and start chatting. The model is now running across both of your Ryzen™ AI Halo nodes:
227+
Once connected, select the model from the model dropdown in Open WebUI and start chatting. The model is now running across both of your Ryzen™ AI Halo nodes:
228228

229229
![Chatting with Qwen3.5-397B in Open WebUI](assets/openwebui-chat.png)
230230

231231
## Next Steps
232232

233233
- **Explore other models**: Discover new models on [Hugging Face](https://huggingface.co/models?&sort=trending) that fit within your cluster's combined GPU memory
234-
- **Scale to four nodes**: Add two more Ryzen™ AI Halo systems as additional Ray workers to shard models across even more GPUs. This requires an Ethernet switch with at least four ports, one for each node. Follow [Step 2: Join the Cluster](#step-2-join-the-cluster-machine-2) on each additional worker and increase `--tensor-parallel-size` accordingly
234+
- **Scale to four nodes**: Add two more Ryzen™ AI Halo systems as additional Ray workers to shard models across even more GPUs. This requires an Ethernet switch with at least four ports, one for each node. Follow [Step 2: Join the Cluster](#step-2-join-the-cluster-machine-2) on each additional worker and increase `--tensor-parallel-size` accordingly
235235
- **Try other parallelism strategies**: vLLM supports [expert parallel](https://docs.vllm.ai/en/latest/serving/expert_parallel_deployment/) for mixture-of-experts models and [data parallel](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/) for higher throughput. Experiment with `--enable-expert-parallel` and `--data-parallel-size` to find the best configuration for your workload

0 commit comments

Comments
 (0)