You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`--status`| Show VRAM/RAM, speed, and fit columns instead of published/download columns. Speed may include `~` for estimated range or `?` for low confidence |
29
+
|`--status`| Compatibility option. Runtime columns are now shown by default |
30
+
|`--details`| Show download metadata instead of runtime columns |
29
31
|`--min-params`| Minimum model knowledge capacity in billions of parameters |
30
32
|`--json`| Print machine-readable JSON |
33
+
|`--markdown`, `-m`| Print a pasteable GitHub-Flavored Markdown table |
31
34
|`--refresh`| Ignore caches and fetch models/benchmarks again |
32
35
|`--cpu-only`| Ignore GPUs and rank for CPU-only use |
33
36
|`--gpu`| Simulate GPU(s) by name. Accepts repeated flags, comma-separated values, and count shorthand |
34
37
|`--vram`| Override simulated GPU VRAM in GB. Requires `--gpu`|
38
+
|`--vram-headroom`| Reserve per-GPU memory for runtime overhead. Default: `auto`. Accepts `none`, byte values like `1.5GB`, or percentages like `10%`|
39
+
|`--ram-budget`| Cap RAM available for partial offload. Accepts `available`, byte values like `8GB`, or percentages like `50%`|
35
40
|`--version`| Print the installed package version |
36
41
37
42
`--fit any` is the default. It can include full-GPU, partial-offload, and
38
-
CPU-only candidates when they are runnable. `--fit full-gpu` and `--gpu-only`
39
-
keep only rows whose `fit_type` is `full_gpu`.
43
+
CPU-only candidates when they are runnable. `--fit gpu`, `--fit full-gpu`, and
44
+
`--gpu-only` keep only rows whose `fit_type` is `full_gpu`.
45
+
46
+
The default table shows memory required, estimated generation speed, fit type,
47
+
and published date. Use `--details` when you want download counts instead.
48
+
Speed colors are absolute usability hints: red is under `4 tok/s`, yellow is
49
+
`4-10 tok/s`, green is `10-30 tok/s`, and bright green is `30+ tok/s`. The `~`
50
+
and `?` markers still refer to estimate confidence, not speed quality.
51
+
52
+
`--vram-headroom auto` subtracts a small budget from each GPU before fit
53
+
checks, so near-edge recommendations are less likely to overflow in tools such
54
+
as LM Studio. Use `--vram-headroom none` to restore the raw detected VRAM.
55
+
`--ram-budget available` caps offload planning to current available RAM.
0 commit comments