-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
Which version of LM Studio?
Example: LM Studio 0.4.6
Which operating system?
MacOS Sequoia 15.6.4
What is the bug?
I get errors with GGUF versions of various embedding models when trying to use them with knowledge stacks in Msty Studio.
Error claims out of memory, but I have a 48GB M4 Max and LMS says only 10 GB is used (verified with Activity Monitor)
I am able to successfully use text-embedding-qwen3-embedding-0.6b, text-embedding-nomic-embed-text-v1.5@q8_0, and text-embedding-mxbai-embed-large-v1
Logs
Attempting to use the Qwen 3 4b and 8b embedding models:
2026-03-13 23:13:24 [DEBUG]
[lmstudio-llama-cpp] LLM: Embedding failed: Failed during string embedding. Message: Unknown exception caused embedding to stop: Failed to decode batch! Error:
2026-03-13 23:13:24 [DEBUG]
[WARNING] At least one last token in strings embedded is not SEP. 'tokenizer.ggml.add_eos_token' should be set to 'true' in the GGUF header
2026-03-13 23:13:24 [DEBUG]
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
2026-03-13 23:13:24 [DEBUG]
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
[repeated several times]
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml_metal_graph_compute: backend is in error state from a previous command buffer failure - recreate the backend to recover
graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
process_ubatch: failed to compute graph, compute status: -1
decode: removing memory module entries for seq_id = 0, pos = [0, +inf)
llama_decode: failed to decode, ret = -3
BatchDecode : failed to decode
llama-embed-nemotron-8b (GGUF) and the model is not auto-loading on request, and when manually loaded fails with
No models loaded. Please load a model in the developer page or use the 'lms load' command.```
**To Reproduce**
Steps to reproduce the behavior:
1. install GGUF embedding models
2. Select embedding models in MstyStudio
3. Attempt to compose embeddings
4. Fails with one of these two categories of errors
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels