Releases · Andyyyy64/whichllm · GitHub

18 Jun 12:58

Andyyyy64

v0.5.12 Latest

Latest

Added

Markdown ranking output with --markdown / -m for pasteable GitHub issues, READMEs, Slack, and Discord.
Runtime-first ranking tables now show memory, estimated speed, fit type, and published date by default.
--speed any|usable|fast and the shorter --fit gpu alias for full-GPU recommendations.
--vram-headroom and --ram-budget for safer fit planning when runtimes or background processes need memory.

Changed

Speed colors now reflect practical generation speed: red under 4 tok/s, yellow from 4-10, green from 10-30, and bright green at 30+ tok/s.
--details restores the download-focused metadata table when needed.

Fixed

Invalid --vram-headroom values are now rejected even in CPU-only runs.

Assets 2

18 Jun 06:01

Andyyyy64

v0.5.11

Added

Multi-GPU simulation for repeated --gpu flags, comma-separated GPU specs, and count shorthand like 2x RTX 4090.
python -m whichllm now runs the CLI.
--gpu-only and --fit full-gpu filter recommendations to models that fit fully in GPU VRAM.
T5 lineage support for version-aware benchmark handling.

Fixed

Cached model and benchmark data are read as UTF-8.
GTX 1650 simulation distinguishes GDDR5 and GDDR6 variants by memory clock.
RAM reserve logic now uses a bounded reserve formula instead of a fixed 80% usable-RAM cap.

Assets 2

11 Jun 07:52

Andyyyy64

v0.5.10

Fixed

Strong partial-offload candidates no longer get buried under weaker full-GPU models because the final sort no longer counts GPU fit twice.
Light partial offload is penalized less aggressively, while heavy dense offload still gets a strong discount.
MoE partial-offload scoring now gives a milder penalty when the active working set can plausibly stay on GPU.

Assets 2

10 Jun 05:49

Andyyyy64

v0.5.9

Highlights

GPU bandwidth detection now falls back to the bundled TechPowerUp database (2,824 GPUs) when a card is missing from the curated catalog. Uncatalogued cards no longer show BW: N/A with 0.0 tok/s estimates and oversized recommendations, and a laptop card can never inherit its desktop sibling's bandwidth. (#74, #98)
Fixed AMD discrete GPU detection on Linux, including RX 6750 XT and the compound lspci name path. (#61)
Artificial Analysis Intelligence Index is fetched live again after the site's App Router migration. Live scores overlay the curated snapshot, so coverage can only grow. (#87)
Added MXFP4 and NVFP4 quantization support. These repos were previously labeled FP16, overestimating VRAM by about 3.5x. (#27)
Added Apple M5-family simulation entries and Kepler-era Quadro catalog coverage.
Community GGUF repos without base_model metadata now match official benchmark scores by name.

QA

CI lint: passed
CI tests: Python 3.11, 3.12, and 3.13 passed
Local: 329 tests passed; sdist and wheel built successfully
Real hardware smoke test on Apple M2

Assets 2

05 Jun 06:45

Andyyyy64

v0.5.8

Highlights

Fixed the A3000 Laptop 6GB ranking regression.
Added retry/backoff for transient Hugging Face and benchmark fetch failures.
Improved Error fetching models output when network exceptions have no message.
Added GPU catalog coverage for A3000 Laptop, RTX 3050, RTX 5060, RTX 5070 Ti, RX 9070, and RX 9070 XT.
Added context-length shorthand and benchmark source/confidence metadata in JSON output.

QA

CI lint: passed
CI tests: Python 3.11, 3.12, and 3.13 passed
Local build: sdist and wheel built successfully

Assets 2

19 May 19:52

Andyyyy64

v0.5.7

What's Changed

Detect DGX Spark / NVIDIA GB10 as a shared-memory NVIDIA GPU when NVIDIA reports memory.total as unavailable.
Fix whichllm run crashes for large Transformers models by providing an offload_folder.
Respect XDG_CACHE_HOME for cache paths, while ignoring relative values per the XDG spec.
Treat Apple Silicon as shared memory in fit detection.
Inline LiveBench fallback data and speed up benchmark score fetching.

Validation

ruff format --check .
ruff check .
pytest -q -s
python -m build
twine check dist/*

Assets 2

17 May 18:54

Andyyyy64

v0.5.6

What's Changed

Add speed estimate confidence metadata and estimated tok/s ranges.
Improve MoE speed estimates using active parameters and bandwidth-scaled read floors.
Add Windows AMD/Intel GPU detection fallback through Win32_VideoController and registry memory reads.
Treat Ryzen AI / Radeon 890M-class Windows iGPUs as shared-memory AMD GPUs.
Avoid summing dedicated GPU VRAM with shared-memory iGPU system RAM as one full-GPU target.

Validation

ruff format --check .
ruff check .
pytest -q -s
python -m build

Assets 2

16 May 22:15

Andyyyy64

v0.5.5

Fixed whichllm run resolving auto-picked GGUF recommendations to the official Transformers repository instead of a real GGUF repo/file.
This fixes the accidental Transformers launch path for models such as Qwen/Qwen3.6-27B.

Assets 2

16 May 17:37

Andyyyy64

v0.5.4

Fixed

Fix Strix Halo / Ryzen AI MAX shared-memory APU handling.
Detect and model STRXLGEN, Radeon 8050S, Radeon 8060S, and related names with a 256 GB/s bandwidth estimate.
Use the shared system-memory pool for fit checks to avoid false CPU-only, 99%-offload, and 0 tok/s recommendations on these systems.

Verification

CI green: lint, test (3.11), test (3.12), test (3.13).
Local verification: ruff check, ruff format --check, pytest, and whichllm --version.

Assets 2

16 May 16:23

Andyyyy64

v0.5.3

What's Changed

Added

Linux Intel integrated GPU detection via /sys/class/drm, so Intel iGPU systems are no longer treated as CPU-only by default.
NVIDIA nvidia-smi fallback detection when pynvml is missing, NVML init fails, or NVML reports no devices.
Apple-prefixed Apple Silicon simulator aliases, so --gpu "Apple M3 Max" works like --gpu "M3 Max".

Fixed

Fixed the whichllm run transformers chat path by passing tokenizer mappings into model.generate(**inputs), avoiding the KeyError: 'shape' crash.
RTX 5060 Ti bandwidth lookup now reports 448 GB/s instead of N/A.

Docs and maintenance

Updated install guidance toward uvx / uv tool install.
Removed the old marketing note and added sponsor metadata.

Verification

uv run pytest — 138 passed
uv run --with ruff ruff check . — passed
uv run --with ruff ruff format --check . — passed
uv run whichllm --version — 0.5.3
uv run --with build python -m build — built wheel and sdist

Assets 2