Skip to content

Fix crash in CLI mode: Update README.md#65

Open
zynzynack wants to merge 1 commit intokyuz0:mainfrom
zynzynack:patch-2
Open

Fix crash in CLI mode: Update README.md#65
zynzynack wants to merge 1 commit intokyuz0:mainfrom
zynzynack:patch-2

Conversation

@zynzynack
Copy link
Copy Markdown
Contributor

Crash: $ llama-cli --no-mmap -ngl 999 -fa 1 -m models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -p "Write a Strix Halo toolkit haiku."
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32

Loading model... \ggml_backend_cuda_buffer_type_alloc_buffer: allocating 3048.00 MiB on device 0: cudaMalloc failed: out of memory alloc_tensor_range: failed to allocate ROCm0 buffer of size 3196059648 llama_init_from_model: failed to initialize the context: failed to allocate buffer for kv cache common_init_result: failed to create context with model 'models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf' common_init_from_params: failed to create context with model 'models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf'
Segmentation fault (core dumped) llama-cli --no-mmap -ngl 999 -fa 1 -m models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -p "Write a Strix Halo toolkit haiku."

Fixed by adding -c

Crash: $ llama-cli --no-mmap -ngl 999 -fa 1   -m models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf   -p "Write a Strix Halo toolkit haiku."
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32

Loading model... \ggml_backend_cuda_buffer_type_alloc_buffer: allocating 3048.00 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate ROCm0 buffer of size 3196059648
llama_init_from_model: failed to initialize the context: failed to allocate buffer for kv cache
common_init_result: failed to create context with model 'models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf'
common_init_from_params: failed to create context with model 'models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf'
Segmentation fault         (core dumped) llama-cli --no-mmap -ngl 999 -fa 1 -m models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -p "Write a Strix Halo toolkit haiku."


Fixed by adding -c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant