Skip to content

Add ROCm Strix Halo Bonsai demo path#71

Open
bong-water-water-bong wants to merge 5 commits into
PrismML-Eng:mainfrom
bong-water-water-bong:main
Open

Add ROCm Strix Halo Bonsai demo path#71
bong-water-water-bong wants to merge 5 commits into
PrismML-Eng:mainfrom
bong-water-water-bong:main

Conversation

@bong-water-water-bong

Copy link
Copy Markdown

Adds ROCm/HIP build support for the Bonsai demo, fixes ternary launch paths, records Strix Halo gfx1151 Ternary-Bonsai 8B Q2_0 validation, and adds optional self-hosted ROCm CI coverage. Local validation: setup completed, ROCm build completed, llama-bench Ternary-Bonsai-8B Q2_0 on Strix Halo hit pp512 1323.29 +/- 10.55 t/s and tg128 79.04 +/- 0.57 t/s; run_llama one-shot prompt exits cleanly.

@bong-water-water-bong

Copy link
Copy Markdown
Author

Added a second commit with the full Strix Halo ROCm benchmark matrix and raw JSONL artifacts. Coverage now includes Ternary-Bonsai 1.7B, 4B, and 8B Q2_0 with isolated pp/tg plus combined prompt+generation workloads. Head commit: 1a1cc92.

@bong-water-water-bong

Copy link
Copy Markdown
Author

Added one more follow-up commit with flash-attention on/off comparison data across 1.7B, 4B, and 8B at pp512/tg128. Short version: keep FA on for this ROCm path; 8B improves from ~1142 to ~1303 tok/s pp512 and ~70 to ~78 tok/s tg128. Head commit: efade28.

@@ -0,0 +1,12 @@
{"build_commit": "d104cf1b6", "build_number": 8846, "cpu_info": "AMD RYZEN AI MAX+ 395 w/ Radeon 8060S", "gpu_info": "AMD Radeon Graphics", "backends": "ROCm", "model_filename": "models/ternary-gguf/1.7B/Ternary-Bonsai-1.7B-Q2_0.gguf", "model_type": "qwen3 1.7B Q2_0", "model_size": 457345184, "model_n_params": 1720028160, "n_batch": 2048, "n_ubatch": 512, "n_threads": 16, "cpu_mask": "0x0", "cpu_strict": false, "poll": 50, "type_k": "f16", "type_v": "f16", "n_gpu_layers": 99, "n_cpu_moe": 0, "split_mode": "layer", "main_gpu": 0, "no_kv_offload": false, "flash_attn": false, "devices": "auto", "tensor_split": "0.00", "tensor_buft_overrides": "none", "use_mmap": true, "use_direct_io": false, "embeddings": false, "no_op_offload": 0, "no_host": false, "fit_target": 0, "fit_min_ctx": 0, "n_prompt": 512, "n_gen": 0, "n_depth": 0, "test_time": "2026-05-07T00:56:16Z", "avg_ns": 105712742, "stddev_ns": 797694, "avg_ts": 4843.498607, "stddev_ts": 36.703200, "samples_ns": [ 106239328, 106103931, 104794967 ],"samples_ts": [ 4819.31, 4825.46, 4885.73 ]}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want the jsonl files, might be too much info.

@@ -0,0 +1,18 @@
{"build_commit": "d104cf1b6", "build_number": 8846, "cpu_info": "AMD RYZEN AI MAX+ 395 w/ Radeon 8060S", "gpu_info": "AMD Radeon Graphics", "backends": "ROCm", "model_filename": "models/ternary-gguf/1.7B/Ternary-Bonsai-1.7B-Q2_0.gguf", "model_type": "qwen3 1.7B Q2_0", "model_size": 457345184, "model_n_params": 1720028160, "n_batch": 2048, "n_ubatch": 512, "n_threads": 16, "cpu_mask": "0x0", "cpu_strict": false, "poll": 50, "type_k": "f16", "type_v": "f16", "n_gpu_layers": 99, "n_cpu_moe": 0, "split_mode": "layer", "main_gpu": 0, "no_kv_offload": false, "flash_attn": true, "devices": "auto", "tensor_split": "0.00", "tensor_buft_overrides": "none", "use_mmap": true, "use_direct_io": false, "embeddings": false, "no_op_offload": 0, "no_host": false, "fit_target": 0, "fit_min_ctx": 0, "n_prompt": 512, "n_gen": 0, "n_depth": 0, "test_time": "2026-05-07T00:32:57Z", "avg_ns": 94925047, "stddev_ns": 339642, "avg_ts": 5393.775104, "stddev_ts": 19.266487, "samples_ns": [ 94868643, 94617421, 95289079 ],"samples_ts": [ 5396.94, 5411.27, 5373.12 ]}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here: do we want the jsonl files, might be too much info.

Comment on lines +5 to +10
inputs:
enable_linux_amd:
description: "Run optional self-hosted Linux AMD/ROCm build"
required: false
default: false
type: boolean

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have that oursevles, is that something in your setup?
I guess can push it here anyway and it just won't run? Might want to make sure does not cause the gituhb action to fail

@@ -0,0 +1,15 @@
{"build_commit": "d104cf1b6", "build_number": 8846, "cpu_info": "AMD RYZEN AI MAX+ 395 w/ Radeon 8060S", "gpu_info": "AMD Radeon Graphics", "backends": "ROCm", "model_filename": "models/ternary-gguf/1.7B/Ternary-Bonsai-1.7B-Q2_0.gguf", "model_type": "qwen3 1.7B Q2_0", "model_size": 457345184, "model_n_params": 1720028160, "n_batch": 2048, "n_ubatch": 512, "n_threads": 16, "cpu_mask": "0x0", "cpu_strict": false, "poll": 50, "type_k": "f16", "type_v": "f16", "n_gpu_layers": 99, "n_cpu_moe": 0, "split_mode": "layer", "main_gpu": 0, "no_kv_offload": false, "flash_attn": true, "devices": "auto", "tensor_split": "0.00", "tensor_buft_overrides": "none", "use_mmap": true, "use_direct_io": false, "embeddings": false, "no_op_offload": 0, "no_host": false, "fit_target": 0, "fit_min_ctx": 0, "n_prompt": 128, "n_gen": 0, "n_depth": 0, "test_time": "2026-05-07T00:30:49Z", "avg_ns": 28543656, "stddev_ns": 173748, "avg_ts": 4484.491304, "stddev_ts": 27.212874, "samples_ns": [ 28520343, 28468025, 28590041, 28334580, 28805294 ],"samples_ts": [ 4488.02, 4496.27, 4477.08, 4517.45, 4443.63 ]}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, do we want the jsonl files, might be too much info.

Comment thread scripts/run_llama.ps1
Comment on lines -21 to -27

$FamilyDisplay = "Ternary-Bonsai"
} else {
$ModelDir = Join-Path $DemoDir "models\gguf\$BonsaiModel"
$FamilyDisplay = "Bonsai"
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes here, recetnly I think the conflict is realted to this

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ROCm/HIP support to the Bonsai demo toolchain (build + runtime), plus Strix Halo (gfx1151) validation artifacts and optional self-hosted CI coverage.

Changes:

  • Introduces a Linux ROCm/HIP source build script and updates docs to reference it.
  • Fixes/standardizes Bonsai vs Ternary-Bonsai display/model-path handling and improves one-shot prompt execution by auto-enabling --single-turn when a prompt/file is provided.
  • Adds Strix Halo ROCm benchmark/validation results and an optional self-hosted ROCm smoke job.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
scripts/start_llama_server.ps1 Simplifies family display handling for GGUF discovery/errors.
scripts/run_llama.sh Auto-adds --single-turn when -p/--prompt or -f/--file is used.
scripts/run_llama.ps1 Auto-adds --single-turn when prompt/file args are present.
scripts/common.sh Prepends /opt/rocm/bin and /opt/rocm/lib to PATH/LD_LIBRARY_PATH when present.
scripts/build_rocm_linux.sh New: builds PrismML llama.cpp with ROCm/HIP and installs to bin/rocm.
README.md Documents ROCm/HIP support and points to ROCm benchmark write-up; updates build instructions.
community-benchmarks/ternary-bonsai/rocm-hip-strix-halo-128gb-linux.md New: Strix Halo ROCm HIP benchmark/validation report.
community-benchmarks/ternary-bonsai/README.md Adds Strix Halo ROCm result entry.
community-benchmarks/README.md Adds combined-table Ternary-Bonsai ROCm result entry.
benchmarks/data/ternary-bonsai-rocm-strix-halo-fa-compare-20260507T005616Z.jsonl New: raw JSONL data for FA on/off comparison.
benchmarks/data/ternary-bonsai-rocm-strix-halo-combined-20260507T003257Z.jsonl New: raw JSONL data for combined prompt+gen runs.
benchmarks/data/ternary-bonsai-rocm-strix-halo-20260507T003049Z.jsonl New: raw JSONL data for isolated prompt/decode runs.
.github/workflows/check-env-vars.yml Expands syntax checks to additional shell scripts incl. ROCm build script.
.github/workflows/build-from-source-smoke.yml Adds optional workflow_dispatch-controlled self-hosted ROCm build+smoke job.
.github/CI.md Documents the optional self-hosted ROCm job.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +30 to +35
while [[ $# -gt 0 ]]; do
case "$1" in
--rocm-path) ROCM_PATH="$2"; shift 2 ;;
--targets) AMDGPU_TARGETS="$2"; shift 2 ;;
--output) OUTPUT_DIR="$2"; shift 2 ;;
*) REPO_DIR="$1"; shift ;;
Comment thread scripts/run_llama.sh
export LD_LIBRARY_PATH="$BIN_DIR${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"

NGL=$(bonsai_llama_ngl)
SINGLE_TURN_ARGS=""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of this one? I thought already setup up single turn

@khosravipasha

Copy link
Copy Markdown
Collaborator

There are some merge conflicts to be resolved, and seems too many files that are not needed.
To merge this I would need minimal code/md files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants