Model load guardrail estimate seems wrong, and GUI “Load anyway” has no CLI / REST equivalent


I’m running into what looks like an incorrect model memory estimate in LM Studio, and it becomes a real problem in headless use.

This machine is headless, so I access the GUI remotely through **LM Link**, which LM Studio documents as a way to use models running on another device as if they were local. In that GUI flow, I can load `qwen3.5-9b` at a context length of `110000` if I click **“Load anyway”**, and the model actually loads and works. But when I try to do the same thing through the CLI or the REST API, LM Studio refuses to load it because it estimates the memory requirement at about **22.92 GB**. 

What makes this look like a bug rather than just a strict safety check is that the model does in fact load successfully when I override the warning in the GUI. Also, with what is effectively the same setup in `llama.cpp`, I’m seeing memory usage closer to **11.5 GB**, so the LM Studio estimate seems much too high in this case.

Here is the CLI path I tried:

```bash
/home/jirka/.lmstudio/bin/lms daemon up
/home/jirka/.lmstudio/bin/lms load qwen3.5-9b --gpu max --context-length 110000 --identifier qwen35-9b-110k --yes
```

and the result is:

```text
Waking up LM Studio service...
llmster started (PID: 24162).
Error: Model loading was stopped due to insufficient system resources. Under the current settings, this model requires approximately 22.92 GB of memory, and continuing to load it would likely overload your system and cause it to freeze. If you think this is incorrect, you can adjust the model loading guardrails in settings.
```

I also tried the native REST API load endpoint:

```bash
curl http://127.0.0.1:1234/api/v1/models/load \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-9b",
    "context_length": 110000,
    "flash_attention": true,
    "offload_kv_cache_to_gpu": false,
    "echo_load_config": true
  }'
```

and got:

```json
{
  "error": {
    "type": "model_load_failed",
    "message": "Failed to load LLM 'qwen3.5-9b': Error: Model loading was stopped due to insufficient system resources. Under the current settings, this model requires approximately 22.92 GB of memory, and continuing to load it would likely overload your system and cause it to freeze. If you think this is incorrect, you can adjust the model loading guardrails in settings."
  }
}
```

I tried to work around this by editing the settings manually in both of these locations:

* `~/.config/LM Studio/settings.json`
* `~/.lmstudio/settings.json`

In both places I tried:

```json
"modelLoadingGuardrails": {
  "mode": "high",
  "customThresholdBytes": 4294967296,
  "alwaysAllowLoadAnyway": true
}
```

but it did not change the behavior of the CLI, daemon, or REST API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model load guardrail estimate seems wrong, and GUI “Load anyway” has no CLI / REST equivalent #499

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model load guardrail estimate seems wrong, and GUI “Load anyway” has no CLI / REST equivalent #499

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions