-
-
Notifications
You must be signed in to change notification settings - Fork 211
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🐛 Bug Description
The following conditions have been tested separately
cargo install shimmy --features llama-vulkan # Cross-platform Vulkan
cargo install shimmy --features llama-opencl # AMD/Intel OpenCL
🔄 Steps to Reproduce
Sonoma 14.8.1 Hackintosh with a 6750 gre graphics driver https://github.com/ChefKissInc/NootRX
Enable hardware acceleration
✅ Expected Behavior
normal use
❌ Actual Behavior
Unexpected exit
📦 Shimmy Version
Latest (main branch)
💻 Operating System
macOS
📥 Installation Method
cargo install shimmy
🌍 Environment Details
No response
📋 Logs/Error Messages
...........................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: AMD Radeon RX 6750 GRE
ggml_metal_init: picking default device: AMD Radeon RX 6750 GRE
ggml_metal_init: use bfloat = true
ggml_metal_init: use fusion = true
ggml_metal_init: use concurrency = true
ggml_metal_init: use graph optimize = true
set_abort_callback: call
llama_context: CPU output buffer size = 0.12 MiB
create_memory: n_ctx = 4096 (padded)
llama_kv_cache: layer 0: dev = Metal
llama_kv_cache: layer 1: dev = Metal
llama_kv_cache: layer 2: dev = Metal
llama_kv_cache: layer 3: dev = Metal
llama_kv_cache: layer 4: dev = Metal
llama_kv_cache: layer 5: dev = Metal
llama_kv_cache: layer 6: dev = Metal
llama_kv_cache: layer 7: dev = Metal
llama_kv_cache: layer 8: dev = Metal
llama_kv_cache: layer 9: dev = Metal
llama_kv_cache: layer 10: dev = Metal
llama_kv_cache: layer 11: dev = Metal
llama_kv_cache: layer 12: dev = Metal
llama_kv_cache: layer 13: dev = Metal
llama_kv_cache: layer 14: dev = Metal
llama_kv_cache: layer 15: dev = Metal
llama_kv_cache: layer 16: dev = Metal
llama_kv_cache: layer 17: dev = Metal
llama_kv_cache: layer 18: dev = Metal
llama_kv_cache: layer 19: dev = Metal
llama_kv_cache: layer 20: dev = Metal
llama_kv_cache: layer 21: dev = Metal
llama_kv_cache: layer 22: dev = Metal
llama_kv_cache: layer 23: dev = Metal
llama_kv_cache: layer 24: dev = Metal
llama_kv_cache: layer 25: dev = Metal
llama_kv_cache: layer 26: dev = Metal
llama_kv_cache: layer 27: dev = Metal
llama_kv_cache: layer 28: dev = Metal
llama_kv_cache: layer 29: dev = Metal
llama_kv_cache: layer 30: dev = Metal
llama_kv_cache: layer 31: dev = Metal
llama_kv_cache: Metal_Private KV buffer size = 1536.00 MiB
llama_kv_cache: size = 1536.00 MiB ( 4096 cells, 32 layers, 1/1 seqs), K (f16): 768.00 MiB, V (f16): 768.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 2
llama_context: max_nodes = 1560
llama_context: reserving full memory module
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
llama_context: layer 0 is assigned to device Metal but the Flash Attention tensor is assigned to device CPU (usually due to missing support)
llama_context: Flash Attention was auto, set to disabled
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
llama_context: Metal_Private compute buffer size = 300.01 MiB
llama_context: CPU compute buffer size = 14.01 MiB
llama_context: graph nodes = 1094
llama_context: graph splits = 2
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Caused GPU Timeout Error (00000002:kIOAccelCommandBufferCallbackErrorTimeout)
~/.cargo/registry/src/rsproxy.cn-e3de039b2554c837/shimmy-llama-cpp-sys-2-0.1.123/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:241: fatal error
(lldb) process attach --pid 7603
Process 7603 stopped
* thread #1, name = 'main', stop reason = signal SIGSTOP
frame #0: 0xffffffffffffffff
Target 0: (No executable module.) stopped.
Architecture set to: .
(lldb) bt
* thread #1, name = 'main', stop reason = signal SIGSTOP
frame #0: 0xffffffffffffffff
(lldb) quit
📝 Additional Context
No response
Michael-A-Kuykendall
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working