Skip to content

Commit a10168d

Browse files
[Paged KV][Doc] Clarify memory env var for two path in readme (#140)
Signed-off-by: ran <hzz5361@psu.edu> Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com> Co-authored-by: Yuan Lik Xun <lxyuan0420@gmail.com>
1 parent 6ecf38f commit a10168d

1 file changed

Lines changed: 14 additions & 0 deletions

File tree

README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,3 +101,17 @@ Environment variables for customization:
101101
| `VLLM_METAL_MODELSCOPE_CACHE` | None | Specify the absolute path of the local model |
102102
| `VLLM_METAL_PREFIX_CACHE` | (unset) | Set to enable prefix caching for shared prompt reuse |
103103
| `VLLM_METAL_PREFIX_CACHE_FRACTION` | `0.05` | Fraction of MLX working set for prefix cache (0, 1] |
104+
105+
106+
## Paged KV vs MLX KV memory settings
107+
108+
- MLX path (`VLLM_METAL_USE_PAGED_ATTENTION=0`): `VLLM_METAL_MEMORY_FRACTION` must be `auto`.
109+
- Paged KV path (`VLLM_METAL_USE_PAGED_ATTENTION=1`): `VLLM_METAL_MEMORY_FRACTION` can be `auto` or a numeric fraction in `(0, 1]`.
110+
- For paged KV with `VLLM_METAL_MEMORY_FRACTION=auto`, vllm-metal uses a default fraction of `0.9`.
111+
112+
`VLLM_METAL_MEMORY_FRACTION` | `VLLM_METAL_USE_PAGED_ATTENTION` | Valid? | Notes
113+
-- | -- | -- | --
114+
`auto` | `0` | Yes | MLX path (default)
115+
`auto` | `1` | Yes | Paged KV path; defaults to 0.9 internally
116+
`0.7` | `1` | Yes | Paged KV path with explicit memory budget
117+
`0.7` | `0` | No | Explicit fraction without paged KV is invalid

0 commit comments

Comments
 (0)