Skip to content

chore(main): release 0.9.7#440

Merged
AlexsJones merged 1 commit intomainfrom
release-please--branches--main
Apr 13, 2026
Merged

chore(main): release 0.9.7#440
AlexsJones merged 1 commit intomainfrom
release-please--branches--main

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 I have created a release beep boop

0.9.7 (2026-04-13)

Features

  • add --force-runtime flag to override automatic runtime selection (c31099f)
  • add --memory flag to override GPU VRAM autodetection (9a35a76)
  • add --memory flag to override GPU VRAM autodetection (b6e323d)
  • add "Sort by Release Date" to TUI (f371cd8)
  • add "Sort by Release Date" to TUI (d55670a)
  • add 15 popular models from HuggingFace (eb886d6)
  • Add 15 popular models from HuggingFace (33→48 models) (e0c0b52)
  • add 18 new models to database (Feb 2026) (8612c0e)
  • add 18 new models to database (Feb 2026) (1c040f2)
  • add 18 new models to database (Feb 2026) — conflict-resolved (0970845)
  • add AWQ/GPTQ support with vLLM inference runtime (e398490)
  • add AWQ/GPTQ support with vLLM inference runtime (d7f2f11)
  • add Catppuccin color themes to TUI (f05d299)
  • add Catppuccin color themes to TUI (104b281)
  • add CLI/TUI model diff compare workflow (3cab068)
  • add context-length cap for memory estimation (--max-context) (0d9a0ef)
  • Add Docker containerization support (42ff7d3)
  • Add Docker containerization support (fbb483c)
  • add Docker Model Runner as a runtime provider (2a84138)
  • add fuzzy search with space-separated terms (89bbe3d)
  • add fuzzy search with space-separated terms (3d967f4), closes #65
  • add GGUF download source enrichment for models (967253b)
  • add Google Gemma 4 models and fix Gemma 3 capabilities (2cb21c3)
  • add homebrew tap support and update release workflow (67b6fcf)
  • add InferenceRuntime enum and MLX quantization support (186b95e)
  • add license filter for models (5bb4b1e)
  • add license filter for models (#186) (33bd708)
  • add Liquid AI LFM2/LFM2.5 models (83d038a)
  • add max-context cap for memory estimation (afba24a)
  • add MiniMax-M2.7 to curated model database (a5cda9b)
  • add MiniMax-M2.7 to curated model database (7c16b9d)
  • add MlxProvider for MLX model detection and downloads (7f0ee24)
  • add model comparison workflow (llmfit diff) and TUI compare mode (76a8a9c)
  • add Ollama mappings for new models (5c97e75)
  • add Qwen 3.5 series (397B/122B/35B/27B) to model database (0d550f3)
  • add Qwen3-Coder-Next (80B MoE) and Qwen 3.5 Ollama mappings (5f8a640)
  • add Qwen3.5 small model series (0.8B, 2B, 4B, 9B) (21cc8ac)
  • add Qwen3.5 small model series (0.8B, 2B, 4B, 9B) (226c5c2)
  • add runtime model database updates from HuggingFace (d465fb9)
  • add runtime model database updates from HuggingFace (1e20ec3)
  • add tensor parallelism awareness for multi-GPU model fitting (e8a6001)
  • add tensor parallelism awareness for multi-GPU model fitting (5d99ec2)
  • add theme switcher with 6 color schemes for TUI (317c44a)
  • add theme switcher with 6 color schemes for TUI (89bd63f)
  • add tok/s column sorting in TUI (41b7358)
  • add tok/s sort option to TUI sort cycle and table header highlighting (82bb526)
  • add use-case popup filter and use-case search matching to TUI (4b1aeab)
  • add use-case popup filter to TUI (4b0f452)
  • added arc support (0d5e991)
  • added in vim like bindings (a3df62f)
  • added LM studio (70e60cc)
  • added logo (86dd737)
  • added moe (4723782)
  • added Rest API (2adb593)
  • adding ollama as supported provider (e198c17)
  • adding release please (c68914a)
  • adding serve capabilities (503229e)
  • api: add capabilities, license, supports_tp to model response and new endpoints (c02e189)
  • append (WSL) to RAM label in tui when running under WSL (aa30f57)
  • availability filter [a] in TUI (All / GGUF Avail / Installed) (9356150)
  • bandwidth-based tok/s estimation for known GPUs (0c552b4)
  • bandwidth-based tok/s estimation for known GPUs (1c34600)
  • cargo fmt (cde77e3)
  • caught some unavailable models on ollama (1d68bba)
  • caught some unavailable models on ollama (6aaba9f)
  • cli: add --sort for fit output (8114446)
  • cli: add --sort option for fit output (3a06877)
  • crate version yank skip rebuild (b1b0628)
  • dashboard: ship embedded web UI and auto-start from CLI (96bd53c)
  • dashboard: ship embedded web UI and auto-start it from CLI (37dd0db)
  • detect installed Ollama models and support pulling from TUI (dc05d5a)
  • expand mlx-community model mappings (7a9e917)
  • expand mlx-community model mappings and improve fallback heuristic (5e8a676), closes #260
  • fix for qwen3_5moe (455709f)
  • fixed regression in LM studio model list (cbeccfa)
  • fixed regression in LM studio model list (734f556)
  • fixed up skill (38c8350)
  • fixed up skill (368fa47)
  • fixing vram on apple bug (bd59767)
  • fixing vram on apple bug (db666b6)
  • fixing vram on apple bug (c26d13b)
  • fixing vram on apple bug (085cd25)
  • GGUF download source enrichment (unsloth & bartowski) (6e3c828)
  • hardware simulation mode and CLI hardware overrides (002eeff), closes #322
  • hardware: detect NVIDIA Tegra/Grace Blackwell unified memory via ATS (rebased #93) (b454697)
  • hardware: detect NVIDIA Tegra/Grace Blackwell unified memory via ATS addressing mode (dd1caa6)
  • improvements based on #12 (7997525)
  • increased model count (5be634f)
  • increment version (d88bd3f)
  • installed status for lm studio (a1cad30)
  • mlx disablement on non-apple hw (c26bb49)
  • MLX-native inference support for Apple Silicon (993599e)
  • model detail modal + Ollama download in desktop app (c92053f)
  • model detail modal + Ollama download in desktop app (dacae77)
  • overall to the scoring system (e689f95)
  • overall to the scoring system (d411a8d)
  • overall to the scoring system (5ca2467)
  • persist filter state across sessions (#430) (c3962e4)
  • persist theme selection to ~/.config/llmfit/theme (765c87b)
  • plan: KV cache fidelity + KV quant flag + TurboQuant gating (1f1b6fc)
  • plumbing 2 (209f9d4)
  • plumbing 2 (ef18920)
  • pull functionality (e193d56)
  • rebased (ac6639d)
  • release plumbing (648a8c1)
  • release plumbing (8215ba8)
  • removed exact model count as it increases so often (1a19c42)
  • reworked available models for download (6d56b03)
  • show approximate disk space usage (#134) (#304) (32c835b)
  • split-pane TUI detail view for GGUF downloads (de5c53b)
  • support for windows vulkan (5a76a8f)
  • supporting 94 models (c68d7aa)
  • surface runtime info in TUI, CLI, and JSON output (4bdb4c4)
  • This PR is to add the ability for capabilities to be described (d09b8fe)
  • This PR is to add the ability for capabilities to be described (05768af)
  • TUI GGUF downloads section, default enrichment, caching (e7dc0b2)
  • tui: add runtime/backend filter and help popup (v0.9.2) (ad28138)
  • update key bindings and add Catppuccin themes to README files (c41567b)
  • updated build actions (59a8af4)
  • updated demo (6c32985)
  • updated demo (42fbeeb)
  • updated images (20205bd)
  • updated models (dfb9afb)
  • updated sorting and new models (744465d)
  • updated tui to support multiple providers better and also multiple GPU suppor (df338e8)
  • updated urls (ce14772)
  • updated version (4d1542b)
  • updating models (d4f0a0e)
  • version bump llama.cpp added (1643214)
  • version bump llama.cpp added (319fb90)
  • web: add 10 color themes from TUI theme.rs (ab14a06)
  • web: add side-by-side model comparison view (ccb56e9)
  • web: replace freeform limit input with preset options (96981ec)
  • web: split App.jsx into Header, SystemPanel, FilterBar, ModelTable, DetailPanel (950e620)
  • working on v0.8.1 (696107d)
  • workspace restructure + Tauri desktop app (83ded27)
  • workspace restructure + Tauri desktop app (3fb10d0)

Bug Fixes

  • accept OLLAMA_HOST values without URL scheme (#166) (c647b13)
  • add --local flag to install script and improve fallback logic (1e5cda1)
  • add --local flag to install script and improve fallback logic (23b500e), closes #56
  • add AMD RX 9060 series to VRAM estimation database (1c02da8)
  • add AMD RX 9060 series to VRAM estimation database (9305810), closes #55
  • add download confirmation to prevent accidental pulls (9e4e881)
  • add download confirmation to prevent accidental pulls (#95) (5b56d74)
  • address AlexsJones review comments (587252b)
  • address Copilot review comments (74384c9)
  • AMD Ryzen AI unified memory detection and --memory override (174fc10)
  • AMD unified memory detection and --memory override (c763717), closes #89 #91
  • bump Dockerfile Rust version to 1.88 for dependency compatibility (c25c02e)
  • bump Dockerfile Rust version to 1.88 for dependency compatibility (6304ba0)
  • cap default estimation context at 8192 tokens (#311) (4699ecb)
  • ci: switch release-please to simple type for workspace version inheritance (48be41a)
  • cli: default auto dashboard host to 0.0.0.0 (4553fbf)
  • cli: make dashboard auto-start controllable and self-cleaning (0c51ad6)
  • correct crates.io metadata and prepare for publishing (f8b08a0)
  • correct crates.io metadata and prepare for publishing (2ce5858), closes #58
  • correct PCI ID reading on Linux and improve AMD GPU identification (dcfde57)
  • correctly estimate VRAM for APU integrated GPUs (2df3298)
  • correctly estimate VRAM for APU integrated GPUs (Radeon Graphics) (5797f10), closes #25
  • Default theme uses terminal colors for light/dark compat (72a6d8d), closes #67
  • desktop: prevent XSS via inline onclick handler in modal (#323) (d20dbee)
  • detect installed Ollama tag variants with suffixes (#165) (7092fa6)
  • detect NVIDIA GB10/GB20 as unified memory GPUs (a77cf36)
  • detect NVIDIA GB10/GB20 as unified memory GPUs (#83, #17) (c0a584a)
  • discover GGUF files in HuggingFace subdirectories (#291) (4e8a596)
  • docker action version (7a38a25)
  • docker action version pin (da96503)
  • download: fetch all shards of multi-part GGUF models (b7494d9)
  • filter AWQ/GPTQ models by GPU compute capability (e720a0e), closes #257
  • find Tauri bundle in correct target directory (19c0ac3)
  • fmt: align providers formatting and enrich TUI compare metrics (a65e4c5)
  • hide MLX models on non-Apple-Silicon hardware (#113) (1e3cf8a)
  • hide MLX models on non-Apple-Silicon hardware (#113) (1052819)
  • hide MLX-only models on non-Apple Silicon systems (ab54fa9)
  • ignore GGUF tests that require network access (#306) (8fd8f4d)
  • iGPU inflating GPU count and force-runtime being ignored (#271) (f584d7e)
  • improve Android CPU and Vulkan GPU detection (0ebb92a)
  • improve download error messages for incompatible model formats (0e2f9cc), closes #123 #198
  • improve MoE tok/s estimate and clarify baseline speed labels (81f88c4)
  • incorrect installed model count in TUI status bar (307c74f)
  • invoke hf instead of huggingface-cli (752680f)
  • invoke hf instead of huggingface-cli (78f8268)
  • pr review fixes (25952da)
  • prefer discrete GPU over integrated on Windows (#303) (3312a40)
  • prefer exact matches in info selection (68272d4)
  • prefer exact matches in info selection (883ea1f)
  • query local providers in recommend CLI command (2477d1d)
  • regression in json only mode (36e8503)
  • remove unused function to eliminate build-time warning (4406582)
  • remove unused function to eliminate build-time warning (289c8b8)
  • Runtimes are installed but no downloadable artifact exist (9424204)
  • support owner-scoped MLX pulls and robust tag normalization (7a9e476)
  • support owner-scoped MLX repos and normalize MLX pull tags (f0e516c)
  • support sudo in piped install script (9fa1960)
  • support sudo in piped install script (cf61557), closes #158
  • surface MoE offloaded RAM in JSON output (96b73ff)
  • surface MoE offloaded RAM in JSON output (929eaad), closes #230
  • tui: avoid borrow conflict when marking compare model (ae60aa5)
  • tui: write dashboard pid file under ~/.llmfit, not /tmp (#324) (00967f1)
  • typo in CHANGELOG.md (suppor -> support) (1e26ec2)
  • typo in CHANGELOG.md (suppor -> support) (aa6cd00)
  • update (2b6767d)
  • update OpenClaw skill to match actual CLI output (df585a7)
  • update OpenClaw skill to match actual CLI output (d53c192)
  • use accurate model counts instead of len()/2 for installed display (7a273cc), closes #189
  • use aggregated VRAM for multi-GPU fit scoring (dc3db39)
  • use aggregated VRAM for multi-GPU fit scoring (f55882d), closes #68
  • use MoE active parameters for tok/s estimation and clarify baseline speed labels (47e70a1)
  • use per-card VRAM instead of summed for multi-GPU systems (2f1f9f4)
  • use per-card VRAM instead of summed for multi-GPU systems (f80b4b0)
  • use quantization in path selection for both dense and MoE models (#49) (8703d43)
  • web: align dashboard hero with project tagline (d4e77c1)

Performance Improvements

  • api: optimize embedded asset serving and cache policy (4d8c0cc)

This PR was generated with Release Please. See documentation.

@github-actions github-actions bot force-pushed the release-please--branches--main branch from 9d94be9 to da6a5f1 Compare April 13, 2026 09:33
@AlexsJones AlexsJones merged commit c9830fc into main Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

🤖 Created releases:

🌻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment