|
| 1 | +--- |
| 2 | +name: apu-memory-tuner |
| 3 | +description: >- |
| 4 | + Inspects and tunes the shared-vs-dedicated memory split on AMD Ryzen APUs |
| 5 | + with unified memory (UMA) so larger LLMs and image-gen models fit on the |
| 6 | + iGPU, or so reserved GPU memory is returned to the CPU. Use when the user |
| 7 | + mentions Ryzen AI, Strix Halo / Strix Point / Krackan / Phoenix / Hawk |
| 8 | + Point, Ryzen AI Max, gfx1150 / gfx1151 / gfx1152, integrated Radeon, iGPU |
| 9 | + memory, UMA Frame Buffer Size, AMD Variable Graphics Memory, VGM, GTT, |
| 10 | + GART, TTM, pages_limit, amd-ttm, amd-debug-tools, "shared GPU memory", |
| 11 | + "dedicated GPU memory", carve-out, "not enough VRAM", "out of VRAM", |
| 12 | + "GPU OOM", llama.cpp on iGPU, ROCm on APU; or asks how much memory the |
| 13 | + iGPU can use, how to give the iGPU more memory, how to balance memory |
| 14 | + between CPU and GPU on UMA, or how to change the BIOS UMA reservation. |
| 15 | + Read-only diagnostics work everywhere; tuning runs automatically on Linux |
| 16 | + via `amd-ttm` and prints guided BIOS steps on Windows. Do not use for |
| 17 | + discrete Radeon cards, Intel iGPUs, or Apple Silicon -- it is APU-only. |
| 18 | +--- |
| 19 | + |
| 20 | +# APU Memory Tuner |
| 21 | + |
| 22 | +Help the user inspect and tune the shared-vs-dedicated memory split on AMD |
| 23 | +Ryzen APUs with unified memory architecture (UMA). The user states intent |
| 24 | +("I want to run a 70B model", "I just want my games to be smooth"); this |
| 25 | +skill picks the numbers, explains the trade-offs, and applies the change |
| 26 | +where it can. |
| 27 | + |
| 28 | +## What's actually going on |
| 29 | + |
| 30 | +On a UMA APU (Strix Halo, Strix Point, Krackan, Phoenix, Hawk Point, etc.) |
| 31 | +the CPU and integrated Radeon share **one physical pool of DRAM**. There is |
| 32 | +no separate VRAM chip. Two logical knobs cut that pool up: |
| 33 | + |
| 34 | +| Knob | Where set | What it does | Reversible? | |
| 35 | +|---|---|---|---| |
| 36 | +| **Dedicated VRAM** (a.k.a. UMA Frame Buffer / carve-out / GART) | BIOS, requires reboot | Permanently reserves DRAM for the GPU. CPU can't see it. | Only via BIOS. | |
| 37 | +| **Shared GPU memory** (a.k.a. GTT) | Kernel/driver-managed | Dynamic, OS-reclaimable cap on how much system RAM the GPU may map at a time. | Yes; Linux: `amd-ttm`. Windows: no user knob. | |
| 38 | + |
| 39 | +Key insight the rest of this skill rests on: **"VRAM" and shared RAM run at |
| 40 | +the same speed on UMA**, because they're the same DRAM. So the right default |
| 41 | +for AI workloads is small VRAM + large GTT, not the other way around. |
| 42 | + |
| 43 | +## When to use |
| 44 | + |
| 45 | +Use this skill when the user wants to: |
| 46 | + |
| 47 | +- Run a model that doesn't fit ("out of VRAM", "GPU OOM" on an iGPU). |
| 48 | +- Inspect the current memory split before changing anything. |
| 49 | +- Move the slider toward more shared memory (LLMs, large image gen) or |
| 50 | + toward more dedicated VRAM (gaming, predictable framebuffer). |
| 51 | +- Revert any prior changes. |
| 52 | + |
| 53 | +Do not use it for: discrete Radeon cards (no GTT/UMA on those), Apple |
| 54 | +Silicon (different architecture entirely), or NVIDIA/Intel GPUs. |
| 55 | + |
| 56 | +## Prerequisites |
| 57 | + |
| 58 | +State these to the user up front so a missing one doesn't surface as a |
| 59 | +mystery script error halfway through: |
| 60 | + |
| 61 | +- **GPU architecture**: AMD APU with integrated Radeon. Officially tuned |
| 62 | + for RDNA3.5 (`gfx1150` / `gfx1151` / `gfx1152`); RDNA3 / RDNA2 APUs work |
| 63 | + but with conservative profile numbers. |
| 64 | +- **OS**: Linux (kernel + `amd-ttm` path) or Windows (guided BIOS path). |
| 65 | + macOS is not supported. |
| 66 | +- **Linux kernel**: see the live AMD matrix linked from `reference.md`. |
| 67 | + The detection script enforces a conservative floor (mainline 6.18.4 / |
| 68 | + Ubuntu HWE 6.17.0 / Ubuntu OEM 6.14.0). |
| 69 | +- **Linux extras**: `pipx install amd-debug-tools` for the `amd-ttm` CLI. |
| 70 | + The skill prints this command but never installs it silently. Reading |
| 71 | + the BIOS carve-out from `dmesg` typically requires `sudo`. |
| 72 | +- **Windows extras**: AMD Adrenalin Software 25.x or newer if the user |
| 73 | + wants the Variable Graphics Memory slider as an alternative to BIOS. |
| 74 | +- **Reboot tolerance**: any tuning change requires a reboot. Don't propose |
| 75 | + this skill mid-workload. |
| 76 | + |
| 77 | +Silent footguns to surface when relevant: |
| 78 | + |
| 79 | +- `HSA_OVERRIDE_GFX_VERSION` — users running ROCm/PyTorch on an APU often |
| 80 | + set this to convince ROCm the iGPU is a supported target. It does not |
| 81 | + affect memory tuning, but if the user reports `HIP error: invalid |
| 82 | + device` after raising GTT, this env var is usually the cause, not the |
| 83 | + GTT change. |
| 84 | +- Windows `Win32_VideoController.AdapterRAM` is a 32-bit field capped at |
| 85 | + 4 GiB. If the user's reported "dedicated VRAM" is exactly 4096 MB, that's |
| 86 | + the WDDM truncation, not the real BIOS reservation. The real value lives |
| 87 | + in Task Manager > Performance > GPU. |
| 88 | + |
| 89 | +## The four-step flow |
| 90 | + |
| 91 | +Run these in order. Each one is read-only until step 4. |
| 92 | + |
| 93 | +``` |
| 94 | +[ ] 1. Detect platform and support level |
| 95 | +[ ] 2. Show current configuration |
| 96 | +[ ] 3. Pick a profile from the user's intent |
| 97 | +[ ] 4. Apply (Linux) or print BIOS guidance (Windows); verify after reboot |
| 98 | +``` |
| 99 | + |
| 100 | +### Step 1: detect platform |
| 101 | + |
| 102 | +```bash |
| 103 | +python scripts/detect_platform.py |
| 104 | +``` |
| 105 | + |
| 106 | +Add `--json` for parseable output. Exit codes: |
| 107 | + |
| 108 | +| Exit | Meaning | Next action | |
| 109 | +|---|---|---| |
| 110 | +| 0 | Supported AMD APU. | Continue to Step 2. | |
| 111 | +| 2 | Wrong hardware (not an AMD APU, or unclassifiable). | Stop. Tell the user this skill can't help them. | |
| 112 | +| 3 | AMD APU but a hard prerequisite is missing (Linux kernel too old). | Stop. Tell the user the prereq and stop. Do not attempt to upgrade the kernel from this skill. | |
| 113 | + |
| 114 | +The script reports the OS, CPU, GPU LLVM target (e.g. `gfx1151`), the |
| 115 | +generation bucket (RDNA3.5 / RDNA3 / RDNA2 / older), total RAM, and on |
| 116 | +Linux the kernel version vs. the minimums in the AMD doc. |
| 117 | + |
| 118 | +### Step 2: show current configuration |
| 119 | + |
| 120 | +```bash |
| 121 | +python scripts/show_config.py |
| 122 | +``` |
| 123 | + |
| 124 | +Reports current dedicated VRAM, current shared-GPU cap, total RAM, and on |
| 125 | +Linux the raw `pages_limit` value plus a `rocminfo` sanity-check of what |
| 126 | +the runtime actually sees. Note any messages it prints — Linux often needs |
| 127 | +`sudo` for the dmesg read of the BIOS carve-out, and Windows' `AdapterRAM` |
| 128 | +field is capped at 4 GiB by WDDM (real value lives in Task Manager). |
| 129 | + |
| 130 | +### Step 3: pick a profile |
| 131 | + |
| 132 | +Ask the user what they want, in workload terms, not numbers. Map their |
| 133 | +answer to one of these: |
| 134 | + |
| 135 | +| Profile | What it does | Use when the user says... | |
| 136 | +|---|---|---| |
| 137 | +| `large-models` **(default)** | GTT to ~75% of RAM, BIOS VRAM at the floor (0.5 GB). | "Run a big model", "fit Llama 70B", "I keep getting OOM on the iGPU", "give the GPU as much memory as possible". | |
| 138 | +| `balanced` | GTT at the kernel default (~50%), 1 GB BIOS VRAM. | "I just do mixed dev work", "back to defaults", "don't waste RAM on the GPU". | |
| 139 | +| `graphics` | GTT at default; BIOS VRAM raised to the larger of 8 GB or 25% of RAM. | "I'm gaming", "I want a predictable framebuffer", "stuttering in games". | |
| 140 | +| `reset` | Revert all changes this skill made. | "Undo it", "go back to stock". | |
| 141 | +| `custom` | Use the explicit `--gtt-gb` / `--vram-gb` the user passed. | "I want exactly N GB". | |
| 142 | + |
| 143 | +**Default**: if the user is here at all and didn't specify, they almost |
| 144 | +always want `large-models` -- that's the only profile that meaningfully |
| 145 | +changes the experience for the workload that brought them here (running a |
| 146 | +model that didn't fit). Use it unless they explicitly said gaming, said |
| 147 | +they want defaults, or said an exact number. |
| 148 | + |
| 149 | +If you still can't tell, use the `AskQuestion` tool with the five options |
| 150 | +above labeled in plain English; do not invent a sixth. |
| 151 | + |
| 152 | +### Step 4: apply or guide |
| 153 | + |
| 154 | +```bash |
| 155 | +python scripts/apply_profile.py --profile <choice> |
| 156 | +``` |
| 157 | + |
| 158 | +Add `--dry-run` first if the user wants to see the planned change before |
| 159 | +committing. |
| 160 | + |
| 161 | +What happens on each OS: |
| 162 | + |
| 163 | +- **Linux**: writes the new GTT cap via `amd-ttm --set <N>` (which persists |
| 164 | + to `/etc/modprobe.d/ttm.conf`). Reboot is required; the script tells the |
| 165 | + user but never reboots automatically. If `amd-ttm` is missing, the script |
| 166 | + exits with a clear install hint (`pipx install amd-debug-tools`) — do not |
| 167 | + install it yourself without confirming with the user. |
| 168 | +- **Windows**: prints step-by-step BIOS instructions for the UMA Frame |
| 169 | + Buffer Size, plus a note about AMD Adrenalin's "Variable Graphics Memory" |
| 170 | + slider as an alternative on supported laptops. Nothing is written to disk |
| 171 | + or registry. The script never modifies BIOS for the user. |
| 172 | + |
| 173 | +Then re-run `python scripts/show_config.py` after reboot to verify. |
| 174 | + |
| 175 | +## OS-specific reality check |
| 176 | + |
| 177 | +| Capability | Linux | Windows | |
| 178 | +|---|---|---| |
| 179 | +| Inspect dedicated VRAM | Yes (`dmesg`/`journalctl`, may need sudo) | Yes (Task Manager / dxdiag; AdapterRAM is truncated) | |
| 180 | +| Inspect shared cap | Yes (`/sys/module/ttm/parameters/pages_limit`) | Yes (dxdiag "Shared Memory") | |
| 181 | +| Change shared cap automatically | Yes (`amd-ttm`) | **No** — WDDM-managed, not user-tunable | |
| 182 | +| Change dedicated VRAM automatically | No (BIOS only) | No (BIOS only; VGM via Adrenalin is the closest UI) | |
| 183 | + |
| 184 | +Net effect: on Windows, raising the BIOS UMA Frame Buffer Size is the only |
| 185 | +real way to give the GPU more memory. On Linux you almost always want to |
| 186 | +*lower* BIOS VRAM and raise GTT instead. |
| 187 | + |
| 188 | +## Safety rules |
| 189 | + |
| 190 | +- Never auto-reboot. Always tell the user the reboot is needed and let them |
| 191 | + do it. |
| 192 | +- Never touch BIOS programmatically. The script prints instructions; the |
| 193 | + user navigates the firmware menu. |
| 194 | +- Never silently install `amd-debug-tools`. Print the `pipx install` line |
| 195 | + and ask before running it. |
| 196 | +- Never set a profile whose validation fails. The script refuses GTT |
| 197 | + targets above 95% of RAM or VRAM targets above 50% of RAM. |
| 198 | +- Never claim a change took effect before the user has rebooted and |
| 199 | + re-verified with `show_config.py`. |
| 200 | + |
| 201 | +## Verification checklist |
| 202 | + |
| 203 | +Mark this skill complete only when **all** are true: |
| 204 | + |
| 205 | +- [ ] `python scripts/detect_platform.py` exits 0. |
| 206 | +- [ ] `python scripts/show_config.py` reports the *new* values after the |
| 207 | + reboot following Step 4. |
| 208 | +- [ ] The user has tried the workload that motivated the change (loaded |
| 209 | + the model, launched the game) and confirmed the new headroom helps. |
| 210 | +- [ ] On Linux, `cat /sys/module/ttm/parameters/pages_limit` matches what |
| 211 | + `apply_profile.py` reported. |
| 212 | + |
| 213 | +If any box is unchecked the change either didn't take effect or the user |
| 214 | +hasn't validated it yet — say so out loud rather than declaring success. |
| 215 | + |
| 216 | +## Reference |
| 217 | + |
| 218 | +For the full glossary, the link to AMD's authoritative kernel-version / |
| 219 | +ROCm-compatibility matrix, per-OEM BIOS notes, profile math, and |
| 220 | +troubleshooting, see [reference.md](reference.md). |
0 commit comments