I'm using a Hackintosh with a 6750gre graphics card, and the program unexpectedly quit

### 🐛 Bug Description

The following conditions have been tested separately
  
``` bash
 cargo install shimmy --features llama-vulkan  # Cross-platform Vulkan
 cargo install shimmy --features llama-opencl  # AMD/Intel OpenCL
```

### 🔄 Steps to Reproduce

Sonoma 14.8.1  Hackintosh with a 6750 gre graphics driver https://github.com/ChefKissInc/NootRX
Enable hardware acceleration

### ✅ Expected Behavior

normal use

### ❌ Actual Behavior

Unexpected exit

### 📦 Shimmy Version

Latest (main branch)

### 💻 Operating System

macOS

### 📥 Installation Method

cargo install shimmy

### 🌍 Environment Details

_No response_

### 📋 Logs/Error Messages
<details>

```text
...........................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 10000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: found device: AMD Radeon RX 6750 GRE
ggml_metal_init: picking default device: AMD Radeon RX 6750 GRE
ggml_metal_init: use bfloat         = true
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
set_abort_callback: call
llama_context:        CPU  output buffer size =     0.12 MiB
create_memory: n_ctx = 4096 (padded)
llama_kv_cache: layer   0: dev = Metal
llama_kv_cache: layer   1: dev = Metal
llama_kv_cache: layer   2: dev = Metal
llama_kv_cache: layer   3: dev = Metal
llama_kv_cache: layer   4: dev = Metal
llama_kv_cache: layer   5: dev = Metal
llama_kv_cache: layer   6: dev = Metal
llama_kv_cache: layer   7: dev = Metal
llama_kv_cache: layer   8: dev = Metal
llama_kv_cache: layer   9: dev = Metal
llama_kv_cache: layer  10: dev = Metal
llama_kv_cache: layer  11: dev = Metal
llama_kv_cache: layer  12: dev = Metal
llama_kv_cache: layer  13: dev = Metal
llama_kv_cache: layer  14: dev = Metal
llama_kv_cache: layer  15: dev = Metal
llama_kv_cache: layer  16: dev = Metal
llama_kv_cache: layer  17: dev = Metal
llama_kv_cache: layer  18: dev = Metal
llama_kv_cache: layer  19: dev = Metal
llama_kv_cache: layer  20: dev = Metal
llama_kv_cache: layer  21: dev = Metal
llama_kv_cache: layer  22: dev = Metal
llama_kv_cache: layer  23: dev = Metal
llama_kv_cache: layer  24: dev = Metal
llama_kv_cache: layer  25: dev = Metal
llama_kv_cache: layer  26: dev = Metal
llama_kv_cache: layer  27: dev = Metal
llama_kv_cache: layer  28: dev = Metal
llama_kv_cache: layer  29: dev = Metal
llama_kv_cache: layer  30: dev = Metal
llama_kv_cache: layer  31: dev = Metal
llama_kv_cache: Metal_Private KV buffer size =  1536.00 MiB
llama_kv_cache: size = 1536.00 MiB (  4096 cells,  32 layers,  1/1 seqs), K (f16):  768.00 MiB, V (f16):  768.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 2
llama_context: max_nodes = 1560
llama_context: reserving full memory module
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
llama_context: layer 0 is assigned to device Metal but the Flash Attention tensor is assigned to device CPU (usually due to missing support)
llama_context: Flash Attention was auto, set to disabled
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
llama_context: Metal_Private compute buffer size =   300.01 MiB
llama_context:        CPU compute buffer size =    14.01 MiB
llama_context: graph nodes  = 1094
llama_context: graph splits = 2
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Caused GPU Timeout Error (00000002:kIOAccelCommandBufferCallbackErrorTimeout)
~/.cargo/registry/src/rsproxy.cn-e3de039b2554c837/shimmy-llama-cpp-sys-2-0.1.123/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:241: fatal error
(lldb) process attach --pid 7603
Process 7603 stopped
* thread #1, name = 'main', stop reason = signal SIGSTOP
    frame #0: 0xffffffffffffffff
Target 0: (No executable module.) stopped.
Architecture set to: .
(lldb) bt
* thread #1, name = 'main', stop reason = signal SIGSTOP
  frame #0: 0xffffffffffffffff
(lldb) quit
```
</details>

### 📝 Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

I'm using a Hackintosh with a 6750gre graphics card, and the program unexpectedly quit #124

🐛 Bug Description

🔄 Steps to Reproduce

✅ Expected Behavior

❌ Actual Behavior

📦 Shimmy Version

💻 Operating System

📥 Installation Method

🌍 Environment Details

📋 Logs/Error Messages

📝 Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

I'm using a Hackintosh with a 6750gre graphics card, and the program unexpectedly quit #124

Description

🐛 Bug Description

🔄 Steps to Reproduce

✅ Expected Behavior

❌ Actual Behavior

📦 Shimmy Version

💻 Operating System

📥 Installation Method

🌍 Environment Details

📋 Logs/Error Messages

📝 Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions