v2.2.1 — Hotfix: GPU Offloading & Model Unload
What's Fixed
Model Unload Broken
The "Unload all models" button and automatic unload on model switch silently failed — the Ollama /generate call was missing the required prompt field. Models stayed in RAM indefinitely, accumulating memory usage.
No GPU Offloading
All Ollama chat calls were missing num_gpu — models could fall back to CPU-only inference, causing 100% CPU/RAM usage while the GPU sat idle. Now sends num_gpu: 99 by default, which tells Ollama to offload as many layers as possible to GPU. Ollama automatically splits between GPU and CPU if VRAM is insufficient, so this is safe for all hardware configurations.
Silent Error Swallowing
Unload errors were caught with .catch(() => {}) and discarded. Failures are now logged to console (console.warn) for debugging.
Impact
Users with dedicated GPUs (especially those with 16 GB RAM or less) should see dramatically lower CPU and RAM usage during inference. The unload button now actually frees memory.
Files Changed
src/api/ollama.ts— fixed unload, addednum_gpu: 99to all chat endpointssrc/api/providers/ollama-provider.ts— addednum_gpu: 99to provider chat callssrc/stores/modelStore.ts— error logging on model switch unload
Full Changelog: v2.2.0...v2.2.1