Releases · Andyyyy64/whichllm

15 May 08:18

v0.5.2

2a566d3

v0.5.2

Hardening release: every Round 3 fix now has a regression test verified
to fail when reverted, the CI lint pipeline is green again (it was red
for the entire 0.5.1 release), and two correctness bugs found by
stress-testing previously unexercised axes are fixed.

Fixed

`--profile vision` generation inversion

Text leaderboards don't score VLMs, so the only model with a direct
benchmark hit was a two-generations-old Qwen2-VL-7B, which outranked
the current Qwen3-VL-32B even on an 80 GB H100. A curated
multimodal capability source (MMMU-Pro / MMBench, 2026-05) now scores
the Qwen3-VL / Qwen2.5-VL / Qwen2-VL / Llama-Vision / Phi-vision /
Gemma-3 / Pixtral / InternVL3 lines. Qwen3-VL-32B now leads vision at
73-76; the legacy 7B correctly drops to the low 30s.

Apple Silicon partial-offload speed (~3x under-estimate)

The flat 0.45x partial-offload penalty modelled a discrete GPU
spilling to CPU RAM across PCIe. Apple Silicon shares one unified-memory
pool, so spilled weights stay at full bandwidth. DeepSeek-R1-class
models on M2/M3 Ultra reported ~1.7 t/s when real-world is 4-15; now
0.85x for unified memory, 0.45x kept for discrete GPUs.

CI lint was red for all of 0.5.1

Qwen/Qwen3-Coder-30B-A3B-Instruct was a duplicate key in the
LiveBench fallback (silently scored 62 instead of 58) and 12 files were
unformatted — both broke the Lint job. Fixed; Lint + Tests are now
green on this release commit in actual GitHub CI.

Added

Round 3 regression suite (tests/test_r3_regressions.py, 20 tests).
Every test was verified to go red when its fix is reverted — they
pin real bugs, not the current implementation.
Benchmark snapshot date shown under every ranking, so a stale
recommendation is self-evident instead of silently trusted.

CI

GitHub Actions runners updated to Node 24 (checkout@v5,
setup-python@v6); Node 20 actions are deprecated from 2026-06.

Full changelog: CHANGELOG.md

Assets 2

13 May 22:28

Andyyyy64

v0.5.1

7d247d1

v0.5.1

What's New

`whichllm upgrade` — Compare GPU upgrades side-by-side

whichllm upgrade --target "RTX 4090"

Shows the current machine and a target GPU together with delta scores
and a verdict (worth it / meaningful / marginal / flat / downgrade).

Apple Silicon support in `--gpu`

whichllm --gpu "M3 Max" --vram 64
whichllm --gpu "M2 Ultra" --vram 192

Simulator now understands every M1-M4 chip (base / Pro / Max / Ultra),
so Mac users can stress-test rankings without owning the hardware. No
more spurious "ROCm requires Linux" warnings on simulated Apple boxes.

Frontier-model coverage refresh

2026-Q2 releases that did not previously surface are now included:
Kimi-K2, MiMo, DeepSeek-V4, GLM-5, Qwen3.6 / Qwen3-Next, gpt-oss,
Llama-4, Mistral Small/Large, Devstral, Codestral, MiniMax,
Granite 3.3/4.0, Olmo-3, Nemotron-3, plus the reasoning lines
QwQ-32B, Qwen3-4B-Thinking, DeepSeek-R1 and the R1-Distill family.

Smarter VRAM / speed estimates

KV cache scaling tuned to match real 128K-context runs.
MoE models split correctly: total params drive VRAM and knowledge,
active params drive speed.
Per-backend speed multipliers (CUDA / Apple / AMD / Intel) and
per-quant efficiency factors so Apple Silicon and partial-offload
numbers stop overshooting.
Lineage-aware demotion stops 2024-era leaderboards (OLLB v2, Arena
ELO) from over-rewarding older generations against their newer
siblings.

Bug fixes

Family inheritance no longer treats a 6.6B "imatrix-aligned" /
MTP-head fork as the same model as its 158B base.
Family grouping prefers the upstream model as the base, not whichever
fork has the most downloads.
httpx follow_redirects=True so case-mismatch HuggingFace URLs (307)
no longer drop frontier IDs silently.
Quality floor (≥ 20) and speed floor (≥ 1.5 t/s) drop junk Q1_0 /
Bonsai-class candidates that previously slipped into low-VRAM
recommendations.
Removed 11 non-existent HF IDs from curated benchmark fallbacks.

Full changelog: CHANGELOG.md

Assets 2

09 Mar 15:17

Andyyyy64

v0.5.0

9ff7519

v0.5.0

What's New

`whichllm run` — One-command chat

Download and chat with any model instantly. Auto-creates an isolated environment, installs dependencies, and starts an interactive session — zero manual setup.

whichllm run "qwen 2.5 1.5b gguf"
whichllm run  # auto-picks the best model for your hardware

Supports all formats: GGUF, AWQ, GPTQ, FP16/BF16.

`whichllm snippet` — Ready-to-run Python code

Print a copy-paste Python script for any model.

whichllm snippet "qwen 7b"

Improvements

Smarter model search: auto-picks top match by downloads instead of erroring on ambiguous queries
Shared helpers for model loading and search across commands
Refactored plan command to use shared search logic

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Fixed

`--profile vision` generation inversion

Apple Silicon partial-offload speed (~3x under-estimate)

CI lint was red for all of 0.5.1

Added

CI

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

`whichllm upgrade` — Compare GPU upgrades side-by-side

Apple Silicon support in `--gpu`

Frontier-model coverage refresh

Smarter VRAM / speed estimates

Bug fixes

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

`whichllm run` — One-command chat

`whichllm snippet` — Ready-to-run Python code

Improvements

Uh oh!

Releases: Andyyyy64/whichllm

v0.5.2

Fixed

--profile vision generation inversion

Apple Silicon partial-offload speed (~3x under-estimate)

CI lint was red for all of 0.5.1

Added

CI

Uh oh!

v0.5.1

What's New

whichllm upgrade — Compare GPU upgrades side-by-side

Apple Silicon support in --gpu

Frontier-model coverage refresh

Smarter VRAM / speed estimates

Bug fixes

Uh oh!

v0.5.0

What's New

whichllm run — One-command chat

whichllm snippet — Ready-to-run Python code

Improvements

Uh oh!

`--profile vision` generation inversion

`whichllm upgrade` — Compare GPU upgrades side-by-side

Apple Silicon support in `--gpu`

`whichllm run` — One-command chat

`whichllm snippet` — Ready-to-run Python code