Skip to content

I'm using a Hackintosh with a 6750gre graphics card, and the program unexpectedly quit #124

@21307369

Description

@21307369

🐛 Bug Description

The following conditions have been tested separately

 cargo install shimmy --features llama-vulkan  # Cross-platform Vulkan
 cargo install shimmy --features llama-opencl  # AMD/Intel OpenCL

🔄 Steps to Reproduce

Sonoma 14.8.1 Hackintosh with a 6750 gre graphics driver https://github.com/ChefKissInc/NootRX
Enable hardware acceleration

✅ Expected Behavior

normal use

❌ Actual Behavior

Unexpected exit

📦 Shimmy Version

Latest (main branch)

💻 Operating System

macOS

📥 Installation Method

cargo install shimmy

🌍 Environment Details

No response

📋 Logs/Error Messages

...........................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 10000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: found device: AMD Radeon RX 6750 GRE
ggml_metal_init: picking default device: AMD Radeon RX 6750 GRE
ggml_metal_init: use bfloat         = true
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
set_abort_callback: call
llama_context:        CPU  output buffer size =     0.12 MiB
create_memory: n_ctx = 4096 (padded)
llama_kv_cache: layer   0: dev = Metal
llama_kv_cache: layer   1: dev = Metal
llama_kv_cache: layer   2: dev = Metal
llama_kv_cache: layer   3: dev = Metal
llama_kv_cache: layer   4: dev = Metal
llama_kv_cache: layer   5: dev = Metal
llama_kv_cache: layer   6: dev = Metal
llama_kv_cache: layer   7: dev = Metal
llama_kv_cache: layer   8: dev = Metal
llama_kv_cache: layer   9: dev = Metal
llama_kv_cache: layer  10: dev = Metal
llama_kv_cache: layer  11: dev = Metal
llama_kv_cache: layer  12: dev = Metal
llama_kv_cache: layer  13: dev = Metal
llama_kv_cache: layer  14: dev = Metal
llama_kv_cache: layer  15: dev = Metal
llama_kv_cache: layer  16: dev = Metal
llama_kv_cache: layer  17: dev = Metal
llama_kv_cache: layer  18: dev = Metal
llama_kv_cache: layer  19: dev = Metal
llama_kv_cache: layer  20: dev = Metal
llama_kv_cache: layer  21: dev = Metal
llama_kv_cache: layer  22: dev = Metal
llama_kv_cache: layer  23: dev = Metal
llama_kv_cache: layer  24: dev = Metal
llama_kv_cache: layer  25: dev = Metal
llama_kv_cache: layer  26: dev = Metal
llama_kv_cache: layer  27: dev = Metal
llama_kv_cache: layer  28: dev = Metal
llama_kv_cache: layer  29: dev = Metal
llama_kv_cache: layer  30: dev = Metal
llama_kv_cache: layer  31: dev = Metal
llama_kv_cache: Metal_Private KV buffer size =  1536.00 MiB
llama_kv_cache: size = 1536.00 MiB (  4096 cells,  32 layers,  1/1 seqs), K (f16):  768.00 MiB, V (f16):  768.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 2
llama_context: max_nodes = 1560
llama_context: reserving full memory module
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
llama_context: layer 0 is assigned to device Metal but the Flash Attention tensor is assigned to device CPU (usually due to missing support)
llama_context: Flash Attention was auto, set to disabled
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
llama_context: Metal_Private compute buffer size =   300.01 MiB
llama_context:        CPU compute buffer size =    14.01 MiB
llama_context: graph nodes  = 1094
llama_context: graph splits = 2
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Caused GPU Timeout Error (00000002:kIOAccelCommandBufferCallbackErrorTimeout)
~/.cargo/registry/src/rsproxy.cn-e3de039b2554c837/shimmy-llama-cpp-sys-2-0.1.123/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:241: fatal error
(lldb) process attach --pid 7603
Process 7603 stopped
* thread #1, name = 'main', stop reason = signal SIGSTOP
    frame #0: 0xffffffffffffffff
Target 0: (No executable module.) stopped.
Architecture set to: .
(lldb) bt
* thread #1, name = 'main', stop reason = signal SIGSTOP
  frame #0: 0xffffffffffffffff
(lldb) quit

📝 Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions