Fix crash in CLI mode: Update README.md by zynzynack · Pull Request #65 · kyuz0/amd-strix-halo-toolboxes

zynzynack · 2026-03-01T14:16:40Z

Crash: $ llama-cli --no-mmap -ngl 999 -fa 1 -m models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -p "Write a Strix Halo toolkit haiku."
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32

Loading model... \ggml_backend_cuda_buffer_type_alloc_buffer: allocating 3048.00 MiB on device 0: cudaMalloc failed: out of memory alloc_tensor_range: failed to allocate ROCm0 buffer of size 3196059648 llama_init_from_model: failed to initialize the context: failed to allocate buffer for kv cache common_init_result: failed to create context with model 'models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf' common_init_from_params: failed to create context with model 'models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf'
Segmentation fault (core dumped) llama-cli --no-mmap -ngl 999 -fa 1 -m models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -p "Write a Strix Halo toolkit haiku."

Fixed by adding -c

Crash: $ llama-cli --no-mmap -ngl 999 -fa 1 -m models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -p "Write a Strix Halo toolkit haiku." ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 Loading model... \ggml_backend_cuda_buffer_type_alloc_buffer: allocating 3048.00 MiB on device 0: cudaMalloc failed: out of memory alloc_tensor_range: failed to allocate ROCm0 buffer of size 3196059648 llama_init_from_model: failed to initialize the context: failed to allocate buffer for kv cache common_init_result: failed to create context with model 'models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf' common_init_from_params: failed to create context with model 'models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf' Segmentation fault (core dumped) llama-cli --no-mmap -ngl 999 -fa 1 -m models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -p "Write a Strix Halo toolkit haiku." Fixed by adding -c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix crash in CLI mode: Update README.md#65

Fix crash in CLI mode: Update README.md#65
zynzynack wants to merge 1 commit intokyuz0:mainfrom
zynzynack:patch-2

zynzynack commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zynzynack commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant