You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sycl: Add optional USM system allocations (#22526)
This introduces an optional feature to allocate large GPU buffers (≥ 1GB)
using USM system allocations if supported by the device. It allows using
buffers from the system allocator then letting the system manage memory
migrations between host and device as necessary.
This feature is disabled by default and requires the GGML_SYCL_USM_SYSTEM
environment variable to enable. If USM system allocations are not supported
by the device or the system, we fallback to regular allocations.
This feature can allow VRAM overcommit. For example, the test below fails
on B580 due to lack of memory for allocation, but it passes when enabling
USM system allocations:
./examples/sycl/test.sh -m Qwen3.5-27B-Q3_K_M.gguf -lv 4
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Copy file name to clipboardExpand all lines: docs/backend/SYCL.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -720,6 +720,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512
720
720
| GGML_SYCL_ENABLE_VMM | 0 or 1 (default) | Enable the virtual-memory device pool. |
721
721
| ZES_ENABLE_SYSMAN | 0 (default) or 1 | Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory.<br>Recommended to use when --split-mode = layer |
722
722
| UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS | 0 (default) or 1 | Allow SYCL/Unified Runtime Level Zero device allocations larger than 4 GiB. llama.cpp's direct Level Zero allocation path requests the relaxed maximum-size limit itself when GGML_SYCL_ENABLE_LEVEL_ZERO=1. |
723
+
| GGML_SYCL_USM_SYSTEM | 0 (default) or 1 | Enable experimental support for [USM system allocations](https://github.khronos.org/SYCL_Reference/iface/usm_basic_concept.html#system-allocations) for large GPU buffers. This requires enough host memory for model weights and caches, an Intel Xe2+ GPU such as BMG or newer and supported on Linux only, with CONFIG_DRM_XE_GPUSVM enabled. |
0 commit comments