failures running many GGUF embedding models

**Which version of LM Studio?**
Example: LM Studio 0.4.6

**Which operating system?**
MacOS Sequoia 15.6.4

**What is the bug?**
I get errors with GGUF versions of various embedding models when trying to use them with knowledge stacks in Msty Studio.

Error claims out of memory, but I have a 48GB M4 Max and LMS says only 10 GB is used (verified with Activity Monitor)

I am able to successfully use text-embedding-qwen3-embedding-0.6b, text-embedding-nomic-embed-text-v1.5@q8_0, and text-embedding-mxbai-embed-large-v1

**Logs**
Attempting to use the Qwen 3 4b and 8b embedding models:
```
2026-03-13 23:13:24 [DEBUG]
 [lmstudio-llama-cpp] LLM: Embedding failed: Failed during string embedding. Message: Unknown exception caused embedding to stop: Failed to decode batch! Error:
2026-03-13 23:13:24 [DEBUG]
 [WARNING] At least one last token in strings embedded is not SEP. 'tokenizer.ggml.add_eos_token' should be set to 'true' in the GGUF header
2026-03-13 23:13:24 [DEBUG]
 ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
2026-03-13 23:13:24 [DEBUG]
 ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
      [repeated several times]
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml_metal_graph_compute: backend is in error state from a previous command buffer failure - recreate the backend to recover
graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
process_ubatch: failed to compute graph, compute status: -1
decode: removing memory module entries for seq_id = 0, pos = [0, +inf)
llama_decode: failed to decode, ret = -3
BatchDecode : failed to decode
```

llama-embed-nemotron-8b (GGUF) and the model is not auto-loading on request, and when manually loaded fails with 
```[ERROR]
 No models loaded. Please load a model in the developer page or use the 'lms load' command.```

**To Reproduce**
Steps to reproduce the behavior:
1. install GGUF embedding models
2. Select embedding models in MstyStudio
3. Attempt to compose embeddings
4. Fails with one of these two categories of errors


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failures running many GGUF embedding models #1647

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

failures running many GGUF embedding models #1647

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions