-
Notifications
You must be signed in to change notification settings - Fork 347
Description
I’m running into what looks like an incorrect model memory estimate in LM Studio, and it becomes a real problem in headless use.
This machine is headless, so I access the GUI remotely through LM Link, which LM Studio documents as a way to use models running on another device as if they were local. In that GUI flow, I can load qwen3.5-9b at a context length of 110000 if I click “Load anyway”, and the model actually loads and works. But when I try to do the same thing through the CLI or the REST API, LM Studio refuses to load it because it estimates the memory requirement at about 22.92 GB.
What makes this look like a bug rather than just a strict safety check is that the model does in fact load successfully when I override the warning in the GUI. Also, with what is effectively the same setup in llama.cpp, I’m seeing memory usage closer to 11.5 GB, so the LM Studio estimate seems much too high in this case.
Here is the CLI path I tried:
/home/jirka/.lmstudio/bin/lms daemon up
/home/jirka/.lmstudio/bin/lms load qwen3.5-9b --gpu max --context-length 110000 --identifier qwen35-9b-110k --yesand the result is:
Waking up LM Studio service...
llmster started (PID: 24162).
Error: Model loading was stopped due to insufficient system resources. Under the current settings, this model requires approximately 22.92 GB of memory, and continuing to load it would likely overload your system and cause it to freeze. If you think this is incorrect, you can adjust the model loading guardrails in settings.
I also tried the native REST API load endpoint:
curl http://127.0.0.1:1234/api/v1/models/load \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-9b",
"context_length": 110000,
"flash_attention": true,
"offload_kv_cache_to_gpu": false,
"echo_load_config": true
}'and got:
{
"error": {
"type": "model_load_failed",
"message": "Failed to load LLM 'qwen3.5-9b': Error: Model loading was stopped due to insufficient system resources. Under the current settings, this model requires approximately 22.92 GB of memory, and continuing to load it would likely overload your system and cause it to freeze. If you think this is incorrect, you can adjust the model loading guardrails in settings."
}
}I tried to work around this by editing the settings manually in both of these locations:
~/.config/LM Studio/settings.json~/.lmstudio/settings.json
In both places I tried:
"modelLoadingGuardrails": {
"mode": "high",
"customThresholdBytes": 4294967296,
"alwaysAllowLoadAnyway": true
}but it did not change the behavior of the CLI, daemon, or REST API.