Description
When I ask plan / snippet / run for a model by size, I sometimes get a model of a completely different size back.
For example, whichllm snippet "qwen 7b gguf" gave me a snippet for Qwen3-1.7B-GGUF — a 1.7B model — even though I clearly asked for a 7B. Same thing with plan "gemma 2b", which planned hardware for the 12B gemma-3-12b-it.
Digging in, the matcher in _search_model (src/whichllm/cli.py) checks each query word as a plain substring of the model ID:
matches = [m for m in models if all(t in m.id.lower() for t in terms)]
The problem is the size token "7b" is a substring of "1.7b", "27b", "17b", etc. So a search for qwen 7b also matches Qwen3-1.7B, Qwen3.6-27B, and so on. The tool then just sorts those matches by download count and picks the top one — so whether you get the right size comes down to which repo happens to be most popular, not what you typed.
It bites three commands since they all go through _search_model: plan, snippet, and run (with an explicit model name). The main ranking command is fine because it doesn't take a name query.
Expected: a size like 7b should only match actual ~7B models, not 1.7B or 27B.
Steps to Reproduce
- Run
uvx whichllm@latest snippet "qwen 7b gguf"
→ resolves to MaziyarPanahi/Qwen3-1.7B-GGUF (1.7B), not a 7B model.
- Run
uvx whichllm@latest plan "gemma 2b"
→ resolves to google/gemma-3-12b-it (12B), because "2b" is inside "12b".
- Run
uvx whichllm@latest snippet "qwen 3b gguf"
→ resolves to Qwen3-30B-A3B-GGUF (30B), because "3b" is inside "A3B".
Minimal proof of the root cause (no models needed):
python3 -c "print('7b' in 'qwen3-1.7b', '2b' in 'gemma-3-12b')"
# True True <- both are spurious matches
Note: exact results track live HuggingFace data, but the substring mismatch (ask small, get large / ask large, get small) is consistently reproducible.
Hardware Info
GPU 0: NVIDIA GeForce RTX 4070 Laptop GPU — 8.0 GB (CC 8.9, CUDA 13.3) — BW: 256 GB/s
GPU 1: Raptor Lake-S UHD Graphics — shared memory — BW: N/A
CPU: Intel(R) Core(TM) i7-14650HX — 16 cores (AVX2)
RAM: 15.3 GB
Disk free: 353.6 GB
OS: linux
Python Version
3.12.13
Operating System
Arch Linux
whichllm Version
0.5.8
Description
When I ask
plan/snippet/runfor a model by size, I sometimes get a model of a completely different size back.For example,
whichllm snippet "qwen 7b gguf"gave me a snippet forQwen3-1.7B-GGUF— a 1.7B model — even though I clearly asked for a 7B. Same thing withplan "gemma 2b", which planned hardware for the 12Bgemma-3-12b-it.Digging in, the matcher in
_search_model(src/whichllm/cli.py) checks each query word as a plain substring of the model ID:The problem is the size token
"7b"is a substring of"1.7b","27b","17b", etc. So a search forqwen 7balso matchesQwen3-1.7B,Qwen3.6-27B, and so on. The tool then just sorts those matches by download count and picks the top one — so whether you get the right size comes down to which repo happens to be most popular, not what you typed.It bites three commands since they all go through
_search_model:plan,snippet, andrun(with an explicit model name). The main ranking command is fine because it doesn't take a name query.Expected: a size like
7bshould only match actual ~7B models, not 1.7B or 27B.Steps to Reproduce
uvx whichllm@latest snippet "qwen 7b gguf"→ resolves to
MaziyarPanahi/Qwen3-1.7B-GGUF(1.7B), not a 7B model.uvx whichllm@latest plan "gemma 2b"→ resolves to
google/gemma-3-12b-it(12B), because"2b"is inside"12b".uvx whichllm@latest snippet "qwen 3b gguf"→ resolves to
Qwen3-30B-A3B-GGUF(30B), because"3b"is inside"A3B".Minimal proof of the root cause (no models needed):
Hardware Info
Python Version
3.12.13
Operating System
Arch Linux
whichllm Version
0.5.8