File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -101,3 +101,17 @@ Environment variables for customization:
101101| ` VLLM_METAL_MODELSCOPE_CACHE ` | None | Specify the absolute path of the local model |
102102| ` VLLM_METAL_PREFIX_CACHE ` | (unset) | Set to enable prefix caching for shared prompt reuse |
103103| ` VLLM_METAL_PREFIX_CACHE_FRACTION ` | ` 0.05 ` | Fraction of MLX working set for prefix cache (0, 1] |
104+
105+
106+ ## Paged KV vs MLX KV memory settings
107+
108+ - MLX path (` VLLM_METAL_USE_PAGED_ATTENTION=0 ` ): ` VLLM_METAL_MEMORY_FRACTION ` must be ` auto ` .
109+ - Paged KV path (` VLLM_METAL_USE_PAGED_ATTENTION=1 ` ): ` VLLM_METAL_MEMORY_FRACTION ` can be ` auto ` or a numeric fraction in ` (0, 1] ` .
110+ - For paged KV with ` VLLM_METAL_MEMORY_FRACTION=auto ` , vllm-metal uses a default fraction of ` 0.9 ` .
111+
112+ ` VLLM_METAL_MEMORY_FRACTION ` | ` VLLM_METAL_USE_PAGED_ATTENTION ` | Valid? | Notes
113+ -- | -- | -- | --
114+ ` auto ` | ` 0 ` | Yes | MLX path (default)
115+ ` auto ` | ` 1 ` | Yes | Paged KV path; defaults to 0.9 internally
116+ ` 0.7 ` | ` 1 ` | Yes | Paged KV path with explicit memory budget
117+ ` 0.7 ` | ` 0 ` | No | Explicit fraction without paged KV is invalid
You can’t perform that action at this time.
0 commit comments