Skip to content

Commit dabd23a

Browse files
Merge pull request #29 from amd/dholanda/apu_mem_tuner
Add `apu-memory-tuner` skill
2 parents af5eee4 + 1ee4f81 commit dabd23a

8 files changed

Lines changed: 1657 additions & 1 deletion

File tree

.claude-plugin/marketplace.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@
88
"version": "0.1.0"
99
},
1010
"plugins": [
11+
{
12+
"name": "apu-memory-tuner",
13+
"source": "./skills/apu-memory-tuner",
14+
"skills": "./",
15+
"description": "Inspect and tune the shared-vs-dedicated memory split (GTT / UMA Frame Buffer) on AMD Ryzen APUs so larger LLMs and image models fit on the iGPU."
16+
},
1117
{
1218
"name": "local-ai-app-integration",
1319
"source": "./skills/local-ai-app-integration",

.github/ISSUE_TEMPLATE/skill-bug-or-improvement.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ body:
1515
label: Which skill?
1616
description: One issue per skill. Pick **Other** if it isn't listed yet.
1717
options:
18+
- apu-memory-tuner
1819
- local-ai-app-integration
1920
- local-ai-use
2021
- Other (name it in the details below)

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ Diagnose, configure, and tune AMD devices directly.
6969

7070
| Skill | What it does |
7171
| --- | --- |
72+
| `apu-memory-tuner` | Inspect and tune the shared-vs-dedicated memory split (GTT / UMA Frame Buffer) on AMD Ryzen APUs. |
7273
| `rocm-doctor` | Detect driver / kernel / ROCm / framework mismatches and propose fixes. |
7374
| `mi300x-tuner` | Opinionated inference tuning for MI300X, including TunableOp, FSDP, and FlashAttention. |
7475
| `gfx-target-chooser` | Pick the right `gfx942` / `gfx90a` / `gfx1100` target and matching compiler flags. |
@@ -199,7 +200,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for step-by-step instructions, the full a
199200

200201
## Status
201202

202-
This repository is in its early days. In-repo skills include `skills/local-ai-app-integration/` and `skills/local-ai-use/`, seeding the **Application integration** focus area. The Hardware-native, Cross-stack porting, and Profiling and delivery focus areas are being built out incrementally alongside manifests and CI. Expect rapid iteration. File an issue if there is a workflow you want covered, or open a PR with a skill you have been wanting to share.
203+
This repository is in its early days. In-repo skills include `skills/local-ai-app-integration/` and `skills/local-ai-use/`, seeding the **Application integration** focus area, and `skills/apu-memory-tuner/`, seeding the **Hardware-native** focus area. The remaining Hardware-native, Cross-stack porting, and Profiling and delivery skills are being built out incrementally alongside manifests and CI. Expect rapid iteration. File an issue if there is a workflow you want covered, or open a PR with a skill you have been wanting to share.
203204

204205
## License
205206

skills/apu-memory-tuner/SKILL.md

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
---
2+
name: apu-memory-tuner
3+
description: >-
4+
Inspects and tunes the shared-vs-dedicated memory split on AMD Ryzen APUs
5+
with unified memory (UMA) so larger LLMs and image-gen models fit on the
6+
iGPU, or so reserved GPU memory is returned to the CPU. Use when the user
7+
mentions Ryzen AI, Strix Halo / Strix Point / Krackan / Phoenix / Hawk
8+
Point, Ryzen AI Max, gfx1150 / gfx1151 / gfx1152, integrated Radeon, iGPU
9+
memory, UMA Frame Buffer Size, AMD Variable Graphics Memory, VGM, GTT,
10+
GART, TTM, pages_limit, amd-ttm, amd-debug-tools, "shared GPU memory",
11+
"dedicated GPU memory", carve-out, "not enough VRAM", "out of VRAM",
12+
"GPU OOM", llama.cpp on iGPU, ROCm on APU; or asks how much memory the
13+
iGPU can use, how to give the iGPU more memory, how to balance memory
14+
between CPU and GPU on UMA, or how to change the BIOS UMA reservation.
15+
Read-only diagnostics work everywhere; tuning runs automatically on Linux
16+
via `amd-ttm` and prints guided BIOS steps on Windows. Do not use for
17+
discrete Radeon cards, Intel iGPUs, or Apple Silicon -- it is APU-only.
18+
---
19+
20+
# APU Memory Tuner
21+
22+
Help the user inspect and tune the shared-vs-dedicated memory split on AMD
23+
Ryzen APUs with unified memory architecture (UMA). The user states intent
24+
("I want to run a 70B model", "I just want my games to be smooth"); this
25+
skill picks the numbers, explains the trade-offs, and applies the change
26+
where it can.
27+
28+
## What's actually going on
29+
30+
On a UMA APU (Strix Halo, Strix Point, Krackan, Phoenix, Hawk Point, etc.)
31+
the CPU and integrated Radeon share **one physical pool of DRAM**. There is
32+
no separate VRAM chip. Two logical knobs cut that pool up:
33+
34+
| Knob | Where set | What it does | Reversible? |
35+
|---|---|---|---|
36+
| **Dedicated VRAM** (a.k.a. UMA Frame Buffer / carve-out / GART) | BIOS, requires reboot | Permanently reserves DRAM for the GPU. CPU can't see it. | Only via BIOS. |
37+
| **Shared GPU memory** (a.k.a. GTT) | Kernel/driver-managed | Dynamic, OS-reclaimable cap on how much system RAM the GPU may map at a time. | Yes; Linux: `amd-ttm`. Windows: no user knob. |
38+
39+
Key insight the rest of this skill rests on: **"VRAM" and shared RAM run at
40+
the same speed on UMA**, because they're the same DRAM. So the right default
41+
for AI workloads is small VRAM + large GTT, not the other way around.
42+
43+
## When to use
44+
45+
Use this skill when the user wants to:
46+
47+
- Run a model that doesn't fit ("out of VRAM", "GPU OOM" on an iGPU).
48+
- Inspect the current memory split before changing anything.
49+
- Move the slider toward more shared memory (LLMs, large image gen) or
50+
toward more dedicated VRAM (gaming, predictable framebuffer).
51+
- Revert any prior changes.
52+
53+
Do not use it for: discrete Radeon cards (no GTT/UMA on those), Apple
54+
Silicon (different architecture entirely), or NVIDIA/Intel GPUs.
55+
56+
## Prerequisites
57+
58+
State these to the user up front so a missing one doesn't surface as a
59+
mystery script error halfway through:
60+
61+
- **GPU architecture**: AMD APU with integrated Radeon. Officially tuned
62+
for RDNA3.5 (`gfx1150` / `gfx1151` / `gfx1152`); RDNA3 / RDNA2 APUs work
63+
but with conservative profile numbers.
64+
- **OS**: Linux (kernel + `amd-ttm` path) or Windows (guided BIOS path).
65+
macOS is not supported.
66+
- **Linux kernel**: see the live AMD matrix linked from `reference.md`.
67+
The detection script enforces a conservative floor (mainline 6.18.4 /
68+
Ubuntu HWE 6.17.0 / Ubuntu OEM 6.14.0).
69+
- **Linux extras**: `pipx install amd-debug-tools` for the `amd-ttm` CLI.
70+
The skill prints this command but never installs it silently. Reading
71+
the BIOS carve-out from `dmesg` typically requires `sudo`.
72+
- **Windows extras**: AMD Adrenalin Software 25.x or newer if the user
73+
wants the Variable Graphics Memory slider as an alternative to BIOS.
74+
- **Reboot tolerance**: any tuning change requires a reboot. Don't propose
75+
this skill mid-workload.
76+
77+
Silent footguns to surface when relevant:
78+
79+
- `HSA_OVERRIDE_GFX_VERSION` — users running ROCm/PyTorch on an APU often
80+
set this to convince ROCm the iGPU is a supported target. It does not
81+
affect memory tuning, but if the user reports `HIP error: invalid
82+
device` after raising GTT, this env var is usually the cause, not the
83+
GTT change.
84+
- Windows `Win32_VideoController.AdapterRAM` is a 32-bit field capped at
85+
4 GiB. If the user's reported "dedicated VRAM" is exactly 4096 MB, that's
86+
the WDDM truncation, not the real BIOS reservation. The real value lives
87+
in Task Manager > Performance > GPU.
88+
89+
## The four-step flow
90+
91+
Run these in order. Each one is read-only until step 4.
92+
93+
```
94+
[ ] 1. Detect platform and support level
95+
[ ] 2. Show current configuration
96+
[ ] 3. Pick a profile from the user's intent
97+
[ ] 4. Apply (Linux) or print BIOS guidance (Windows); verify after reboot
98+
```
99+
100+
### Step 1: detect platform
101+
102+
```bash
103+
python scripts/detect_platform.py
104+
```
105+
106+
Add `--json` for parseable output. Exit codes:
107+
108+
| Exit | Meaning | Next action |
109+
|---|---|---|
110+
| 0 | Supported AMD APU. | Continue to Step 2. |
111+
| 2 | Wrong hardware (not an AMD APU, or unclassifiable). | Stop. Tell the user this skill can't help them. |
112+
| 3 | AMD APU but a hard prerequisite is missing (Linux kernel too old). | Stop. Tell the user the prereq and stop. Do not attempt to upgrade the kernel from this skill. |
113+
114+
The script reports the OS, CPU, GPU LLVM target (e.g. `gfx1151`), the
115+
generation bucket (RDNA3.5 / RDNA3 / RDNA2 / older), total RAM, and on
116+
Linux the kernel version vs. the minimums in the AMD doc.
117+
118+
### Step 2: show current configuration
119+
120+
```bash
121+
python scripts/show_config.py
122+
```
123+
124+
Reports current dedicated VRAM, current shared-GPU cap, total RAM, and on
125+
Linux the raw `pages_limit` value plus a `rocminfo` sanity-check of what
126+
the runtime actually sees. Note any messages it prints — Linux often needs
127+
`sudo` for the dmesg read of the BIOS carve-out, and Windows' `AdapterRAM`
128+
field is capped at 4 GiB by WDDM (real value lives in Task Manager).
129+
130+
### Step 3: pick a profile
131+
132+
Ask the user what they want, in workload terms, not numbers. Map their
133+
answer to one of these:
134+
135+
| Profile | What it does | Use when the user says... |
136+
|---|---|---|
137+
| `large-models` **(default)** | GTT to ~75% of RAM, BIOS VRAM at the floor (0.5 GB). | "Run a big model", "fit Llama 70B", "I keep getting OOM on the iGPU", "give the GPU as much memory as possible". |
138+
| `balanced` | GTT at the kernel default (~50%), 1 GB BIOS VRAM. | "I just do mixed dev work", "back to defaults", "don't waste RAM on the GPU". |
139+
| `graphics` | GTT at default; BIOS VRAM raised to the larger of 8 GB or 25% of RAM. | "I'm gaming", "I want a predictable framebuffer", "stuttering in games". |
140+
| `reset` | Revert all changes this skill made. | "Undo it", "go back to stock". |
141+
| `custom` | Use the explicit `--gtt-gb` / `--vram-gb` the user passed. | "I want exactly N GB". |
142+
143+
**Default**: if the user is here at all and didn't specify, they almost
144+
always want `large-models` -- that's the only profile that meaningfully
145+
changes the experience for the workload that brought them here (running a
146+
model that didn't fit). Use it unless they explicitly said gaming, said
147+
they want defaults, or said an exact number.
148+
149+
If you still can't tell, use the `AskQuestion` tool with the five options
150+
above labeled in plain English; do not invent a sixth.
151+
152+
### Step 4: apply or guide
153+
154+
```bash
155+
python scripts/apply_profile.py --profile <choice>
156+
```
157+
158+
Add `--dry-run` first if the user wants to see the planned change before
159+
committing.
160+
161+
What happens on each OS:
162+
163+
- **Linux**: writes the new GTT cap via `amd-ttm --set <N>` (which persists
164+
to `/etc/modprobe.d/ttm.conf`). Reboot is required; the script tells the
165+
user but never reboots automatically. If `amd-ttm` is missing, the script
166+
exits with a clear install hint (`pipx install amd-debug-tools`) — do not
167+
install it yourself without confirming with the user.
168+
- **Windows**: prints step-by-step BIOS instructions for the UMA Frame
169+
Buffer Size, plus a note about AMD Adrenalin's "Variable Graphics Memory"
170+
slider as an alternative on supported laptops. Nothing is written to disk
171+
or registry. The script never modifies BIOS for the user.
172+
173+
Then re-run `python scripts/show_config.py` after reboot to verify.
174+
175+
## OS-specific reality check
176+
177+
| Capability | Linux | Windows |
178+
|---|---|---|
179+
| Inspect dedicated VRAM | Yes (`dmesg`/`journalctl`, may need sudo) | Yes (Task Manager / dxdiag; AdapterRAM is truncated) |
180+
| Inspect shared cap | Yes (`/sys/module/ttm/parameters/pages_limit`) | Yes (dxdiag "Shared Memory") |
181+
| Change shared cap automatically | Yes (`amd-ttm`) | **No** — WDDM-managed, not user-tunable |
182+
| Change dedicated VRAM automatically | No (BIOS only) | No (BIOS only; VGM via Adrenalin is the closest UI) |
183+
184+
Net effect: on Windows, raising the BIOS UMA Frame Buffer Size is the only
185+
real way to give the GPU more memory. On Linux you almost always want to
186+
*lower* BIOS VRAM and raise GTT instead.
187+
188+
## Safety rules
189+
190+
- Never auto-reboot. Always tell the user the reboot is needed and let them
191+
do it.
192+
- Never touch BIOS programmatically. The script prints instructions; the
193+
user navigates the firmware menu.
194+
- Never silently install `amd-debug-tools`. Print the `pipx install` line
195+
and ask before running it.
196+
- Never set a profile whose validation fails. The script refuses GTT
197+
targets above 95% of RAM or VRAM targets above 50% of RAM.
198+
- Never claim a change took effect before the user has rebooted and
199+
re-verified with `show_config.py`.
200+
201+
## Verification checklist
202+
203+
Mark this skill complete only when **all** are true:
204+
205+
- [ ] `python scripts/detect_platform.py` exits 0.
206+
- [ ] `python scripts/show_config.py` reports the *new* values after the
207+
reboot following Step 4.
208+
- [ ] The user has tried the workload that motivated the change (loaded
209+
the model, launched the game) and confirmed the new headroom helps.
210+
- [ ] On Linux, `cat /sys/module/ttm/parameters/pages_limit` matches what
211+
`apply_profile.py` reported.
212+
213+
If any box is unchecked the change either didn't take effect or the user
214+
hasn't validated it yet — say so out loud rather than declaring success.
215+
216+
## Reference
217+
218+
For the full glossary, the link to AMD's authoritative kernel-version /
219+
ROCm-compatibility matrix, per-OEM BIOS notes, profile math, and
220+
troubleshooting, see [reference.md](reference.md).

0 commit comments

Comments
 (0)