Skip to content

v2.2.1 — Hotfix: GPU Offloading & Model Unload

Choose a tag to compare

@PurpleDoubleD PurpleDoubleD released this 04 Apr 15:37
· 172 commits to master since this release

What's Fixed

Model Unload Broken

The "Unload all models" button and automatic unload on model switch silently failed — the Ollama /generate call was missing the required prompt field. Models stayed in RAM indefinitely, accumulating memory usage.

No GPU Offloading

All Ollama chat calls were missing num_gpu — models could fall back to CPU-only inference, causing 100% CPU/RAM usage while the GPU sat idle. Now sends num_gpu: 99 by default, which tells Ollama to offload as many layers as possible to GPU. Ollama automatically splits between GPU and CPU if VRAM is insufficient, so this is safe for all hardware configurations.

Silent Error Swallowing

Unload errors were caught with .catch(() => {}) and discarded. Failures are now logged to console (console.warn) for debugging.

Impact

Users with dedicated GPUs (especially those with 16 GB RAM or less) should see dramatically lower CPU and RAM usage during inference. The unload button now actually frees memory.

Files Changed

  • src/api/ollama.ts — fixed unload, added num_gpu: 99 to all chat endpoints
  • src/api/providers/ollama-provider.ts — added num_gpu: 99 to provider chat calls
  • src/stores/modelStore.ts — error logging on model switch unload

Full Changelog: v2.2.0...v2.2.1