Release v2.2.1 — Hotfix: GPU Offloading & Model Unload · PurpleDoubleD/locally-uncensored

What's Fixed

Model Unload Broken

The "Unload all models" button and automatic unload on model switch silently failed — the Ollama /generate call was missing the required prompt field. Models stayed in RAM indefinitely, accumulating memory usage.

No GPU Offloading

All Ollama chat calls were missing num_gpu — models could fall back to CPU-only inference, causing 100% CPU/RAM usage while the GPU sat idle. Now sends num_gpu: 99 by default, which tells Ollama to offload as many layers as possible to GPU. Ollama automatically splits between GPU and CPU if VRAM is insufficient, so this is safe for all hardware configurations.

Silent Error Swallowing

Unload errors were caught with .catch(() => {}) and discarded. Failures are now logged to console (console.warn) for debugging.

Impact

Users with dedicated GPUs (especially those with 16 GB RAM or less) should see dramatically lower CPU and RAM usage during inference. The unload button now actually frees memory.

Files Changed

src/api/ollama.ts — fixed unload, added num_gpu: 99 to all chat endpoints
src/api/providers/ollama-provider.ts — added num_gpu: 99 to provider chat calls
src/stores/modelStore.ts — error logging on model switch unload

Full Changelog: v2.2.0...v2.2.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.2.1 — Hotfix: GPU Offloading & Model Unload

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Fixed

Model Unload Broken

No GPU Offloading

Silent Error Swallowing

Impact

Files Changed

Uh oh!