Skip to content

Model load guardrail estimate seems wrong, and GUI “Load anyway” has no CLI / REST equivalent #499

@mistrjirka

Description

@mistrjirka

I’m running into what looks like an incorrect model memory estimate in LM Studio, and it becomes a real problem in headless use.

This machine is headless, so I access the GUI remotely through LM Link, which LM Studio documents as a way to use models running on another device as if they were local. In that GUI flow, I can load qwen3.5-9b at a context length of 110000 if I click “Load anyway”, and the model actually loads and works. But when I try to do the same thing through the CLI or the REST API, LM Studio refuses to load it because it estimates the memory requirement at about 22.92 GB.

What makes this look like a bug rather than just a strict safety check is that the model does in fact load successfully when I override the warning in the GUI. Also, with what is effectively the same setup in llama.cpp, I’m seeing memory usage closer to 11.5 GB, so the LM Studio estimate seems much too high in this case.

Here is the CLI path I tried:

/home/jirka/.lmstudio/bin/lms daemon up
/home/jirka/.lmstudio/bin/lms load qwen3.5-9b --gpu max --context-length 110000 --identifier qwen35-9b-110k --yes

and the result is:

Waking up LM Studio service...
llmster started (PID: 24162).
Error: Model loading was stopped due to insufficient system resources. Under the current settings, this model requires approximately 22.92 GB of memory, and continuing to load it would likely overload your system and cause it to freeze. If you think this is incorrect, you can adjust the model loading guardrails in settings.

I also tried the native REST API load endpoint:

curl http://127.0.0.1:1234/api/v1/models/load \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-9b",
    "context_length": 110000,
    "flash_attention": true,
    "offload_kv_cache_to_gpu": false,
    "echo_load_config": true
  }'

and got:

{
  "error": {
    "type": "model_load_failed",
    "message": "Failed to load LLM 'qwen3.5-9b': Error: Model loading was stopped due to insufficient system resources. Under the current settings, this model requires approximately 22.92 GB of memory, and continuing to load it would likely overload your system and cause it to freeze. If you think this is incorrect, you can adjust the model loading guardrails in settings."
  }
}

I tried to work around this by editing the settings manually in both of these locations:

  • ~/.config/LM Studio/settings.json
  • ~/.lmstudio/settings.json

In both places I tried:

"modelLoadingGuardrails": {
  "mode": "high",
  "customThresholdBytes": 4294967296,
  "alwaysAllowLoadAnyway": true
}

but it did not change the behavior of the CLI, daemon, or REST API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions