Add ROCm Strix Halo Bonsai demo path#71
Conversation
|
Added a second commit with the full Strix Halo ROCm benchmark matrix and raw JSONL artifacts. Coverage now includes Ternary-Bonsai 1.7B, 4B, and 8B Q2_0 with isolated pp/tg plus combined prompt+generation workloads. Head commit: 1a1cc92. |
|
Added one more follow-up commit with flash-attention on/off comparison data across 1.7B, 4B, and 8B at pp512/tg128. Short version: keep FA on for this ROCm path; 8B improves from ~1142 to ~1303 tok/s pp512 and ~70 to ~78 tok/s tg128. Head commit: efade28. |
| @@ -0,0 +1,12 @@ | |||
| {"build_commit": "d104cf1b6", "build_number": 8846, "cpu_info": "AMD RYZEN AI MAX+ 395 w/ Radeon 8060S", "gpu_info": "AMD Radeon Graphics", "backends": "ROCm", "model_filename": "models/ternary-gguf/1.7B/Ternary-Bonsai-1.7B-Q2_0.gguf", "model_type": "qwen3 1.7B Q2_0", "model_size": 457345184, "model_n_params": 1720028160, "n_batch": 2048, "n_ubatch": 512, "n_threads": 16, "cpu_mask": "0x0", "cpu_strict": false, "poll": 50, "type_k": "f16", "type_v": "f16", "n_gpu_layers": 99, "n_cpu_moe": 0, "split_mode": "layer", "main_gpu": 0, "no_kv_offload": false, "flash_attn": false, "devices": "auto", "tensor_split": "0.00", "tensor_buft_overrides": "none", "use_mmap": true, "use_direct_io": false, "embeddings": false, "no_op_offload": 0, "no_host": false, "fit_target": 0, "fit_min_ctx": 0, "n_prompt": 512, "n_gen": 0, "n_depth": 0, "test_time": "2026-05-07T00:56:16Z", "avg_ns": 105712742, "stddev_ns": 797694, "avg_ts": 4843.498607, "stddev_ts": 36.703200, "samples_ns": [ 106239328, 106103931, 104794967 ],"samples_ts": [ 4819.31, 4825.46, 4885.73 ]} | |||
There was a problem hiding this comment.
do we want the jsonl files, might be too much info.
| @@ -0,0 +1,18 @@ | |||
| {"build_commit": "d104cf1b6", "build_number": 8846, "cpu_info": "AMD RYZEN AI MAX+ 395 w/ Radeon 8060S", "gpu_info": "AMD Radeon Graphics", "backends": "ROCm", "model_filename": "models/ternary-gguf/1.7B/Ternary-Bonsai-1.7B-Q2_0.gguf", "model_type": "qwen3 1.7B Q2_0", "model_size": 457345184, "model_n_params": 1720028160, "n_batch": 2048, "n_ubatch": 512, "n_threads": 16, "cpu_mask": "0x0", "cpu_strict": false, "poll": 50, "type_k": "f16", "type_v": "f16", "n_gpu_layers": 99, "n_cpu_moe": 0, "split_mode": "layer", "main_gpu": 0, "no_kv_offload": false, "flash_attn": true, "devices": "auto", "tensor_split": "0.00", "tensor_buft_overrides": "none", "use_mmap": true, "use_direct_io": false, "embeddings": false, "no_op_offload": 0, "no_host": false, "fit_target": 0, "fit_min_ctx": 0, "n_prompt": 512, "n_gen": 0, "n_depth": 0, "test_time": "2026-05-07T00:32:57Z", "avg_ns": 94925047, "stddev_ns": 339642, "avg_ts": 5393.775104, "stddev_ts": 19.266487, "samples_ns": [ 94868643, 94617421, 95289079 ],"samples_ts": [ 5396.94, 5411.27, 5373.12 ]} | |||
There was a problem hiding this comment.
same here: do we want the jsonl files, might be too much info.
| inputs: | ||
| enable_linux_amd: | ||
| description: "Run optional self-hosted Linux AMD/ROCm build" | ||
| required: false | ||
| default: false | ||
| type: boolean |
There was a problem hiding this comment.
we don't have that oursevles, is that something in your setup?
I guess can push it here anyway and it just won't run? Might want to make sure does not cause the gituhb action to fail
| @@ -0,0 +1,15 @@ | |||
| {"build_commit": "d104cf1b6", "build_number": 8846, "cpu_info": "AMD RYZEN AI MAX+ 395 w/ Radeon 8060S", "gpu_info": "AMD Radeon Graphics", "backends": "ROCm", "model_filename": "models/ternary-gguf/1.7B/Ternary-Bonsai-1.7B-Q2_0.gguf", "model_type": "qwen3 1.7B Q2_0", "model_size": 457345184, "model_n_params": 1720028160, "n_batch": 2048, "n_ubatch": 512, "n_threads": 16, "cpu_mask": "0x0", "cpu_strict": false, "poll": 50, "type_k": "f16", "type_v": "f16", "n_gpu_layers": 99, "n_cpu_moe": 0, "split_mode": "layer", "main_gpu": 0, "no_kv_offload": false, "flash_attn": true, "devices": "auto", "tensor_split": "0.00", "tensor_buft_overrides": "none", "use_mmap": true, "use_direct_io": false, "embeddings": false, "no_op_offload": 0, "no_host": false, "fit_target": 0, "fit_min_ctx": 0, "n_prompt": 128, "n_gen": 0, "n_depth": 0, "test_time": "2026-05-07T00:30:49Z", "avg_ns": 28543656, "stddev_ns": 173748, "avg_ts": 4484.491304, "stddev_ts": 27.212874, "samples_ns": [ 28520343, 28468025, 28590041, 28334580, 28805294 ],"samples_ts": [ 4488.02, 4496.27, 4477.08, 4517.45, 4443.63 ]} | |||
There was a problem hiding this comment.
same here, do we want the jsonl files, might be too much info.
|
|
||
| $FamilyDisplay = "Ternary-Bonsai" | ||
| } else { | ||
| $ModelDir = Join-Path $DemoDir "models\gguf\$BonsaiModel" | ||
| $FamilyDisplay = "Bonsai" | ||
| } | ||
|
|
There was a problem hiding this comment.
I made some changes here, recetnly I think the conflict is realted to this
There was a problem hiding this comment.
Pull request overview
Adds ROCm/HIP support to the Bonsai demo toolchain (build + runtime), plus Strix Halo (gfx1151) validation artifacts and optional self-hosted CI coverage.
Changes:
- Introduces a Linux ROCm/HIP source build script and updates docs to reference it.
- Fixes/standardizes Bonsai vs Ternary-Bonsai display/model-path handling and improves one-shot prompt execution by auto-enabling
--single-turnwhen a prompt/file is provided. - Adds Strix Halo ROCm benchmark/validation results and an optional self-hosted ROCm smoke job.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| scripts/start_llama_server.ps1 | Simplifies family display handling for GGUF discovery/errors. |
| scripts/run_llama.sh | Auto-adds --single-turn when -p/--prompt or -f/--file is used. |
| scripts/run_llama.ps1 | Auto-adds --single-turn when prompt/file args are present. |
| scripts/common.sh | Prepends /opt/rocm/bin and /opt/rocm/lib to PATH/LD_LIBRARY_PATH when present. |
| scripts/build_rocm_linux.sh | New: builds PrismML llama.cpp with ROCm/HIP and installs to bin/rocm. |
| README.md | Documents ROCm/HIP support and points to ROCm benchmark write-up; updates build instructions. |
| community-benchmarks/ternary-bonsai/rocm-hip-strix-halo-128gb-linux.md | New: Strix Halo ROCm HIP benchmark/validation report. |
| community-benchmarks/ternary-bonsai/README.md | Adds Strix Halo ROCm result entry. |
| community-benchmarks/README.md | Adds combined-table Ternary-Bonsai ROCm result entry. |
| benchmarks/data/ternary-bonsai-rocm-strix-halo-fa-compare-20260507T005616Z.jsonl | New: raw JSONL data for FA on/off comparison. |
| benchmarks/data/ternary-bonsai-rocm-strix-halo-combined-20260507T003257Z.jsonl | New: raw JSONL data for combined prompt+gen runs. |
| benchmarks/data/ternary-bonsai-rocm-strix-halo-20260507T003049Z.jsonl | New: raw JSONL data for isolated prompt/decode runs. |
| .github/workflows/check-env-vars.yml | Expands syntax checks to additional shell scripts incl. ROCm build script. |
| .github/workflows/build-from-source-smoke.yml | Adds optional workflow_dispatch-controlled self-hosted ROCm build+smoke job. |
| .github/CI.md | Documents the optional self-hosted ROCm job. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| while [[ $# -gt 0 ]]; do | ||
| case "$1" in | ||
| --rocm-path) ROCM_PATH="$2"; shift 2 ;; | ||
| --targets) AMDGPU_TARGETS="$2"; shift 2 ;; | ||
| --output) OUTPUT_DIR="$2"; shift 2 ;; | ||
| *) REPO_DIR="$1"; shift ;; |
| export LD_LIBRARY_PATH="$BIN_DIR${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" | ||
|
|
||
| NGL=$(bonsai_llama_ngl) | ||
| SINGLE_TURN_ARGS="" |
There was a problem hiding this comment.
what's the purpose of this one? I thought already setup up single turn
… <223556219+Copilot@users.noreply.github.com>
|
There are some merge conflicts to be resolved, and seems too many files that are not needed. |
Adds ROCm/HIP build support for the Bonsai demo, fixes ternary launch paths, records Strix Halo gfx1151 Ternary-Bonsai 8B Q2_0 validation, and adds optional self-hosted ROCm CI coverage. Local validation: setup completed, ROCm build completed, llama-bench Ternary-Bonsai-8B Q2_0 on Strix Halo hit pp512 1323.29 +/- 10.55 t/s and tg128 79.04 +/- 0.57 t/s; run_llama one-shot prompt exits cleanly.