diff --git a/design-proposals/dgpu-support.md b/design-proposals/dgpu-support.md index cae895adc..2155a2f78 100644 --- a/design-proposals/dgpu-support.md +++ b/design-proposals/dgpu-support.md @@ -1,6 +1,6 @@ # Design Proposal: Support for dGPU -Author(s): Rajeev Ranjan, Sandeep Sharma +Author(s): Rajeev Ranjan, Sandeep Sharma, Jagrat Acharya Last updated: 2025-05-16 ## Abstract @@ -44,7 +44,88 @@ Note that integration on Ubuntu 24.04 is expected in release 3.2. So, the curren Develop/update the cluster extensions supported by the platform to ease the consumption of GPUs. https://github.com/open-edge-platform/cluster-extensions/blob/main/README.md * **Intel** + * Intel Arc B580 Graphics is compatible with the Xe2 driver architecture. + * Upgrading the kernel to version 6.11 or its minor variations enables Xe2 driver support. + * The DEB package will be installed during node onboarding. + + ![BMG-Supported-Linux-kernel](images/BMG-driver.png) + https://dgpu-docs.intel.com/devices/hardware-table.html + * Refer to the official documentation for driver installation. DEB package install GPU drivers as part of rolling updates. + + https://dgpu-docs.intel.com/driver/installation-rolling.html + + * **Ubuntu-22.04** + * Ubuntu 22.04 officially supports the 6.8 Linux kernel, whereas the B580 requires the 6.11 Linux kernel. To install the driver for the B580, a custom 6.11 kernel is required. + * The driver installation instructions for Ubuntu 22.04 were ineffective for B580, resulting in a failed card enumeration. + + * **Ubuntu-24.04 Desktop** + * Ubuntu Desktop 24.04.2 includes the xe driver by default. + * For Any additional compute and media package intel provide PPA, can be installed from the official Intel site. + + https://dgpu-docs.intel.com/driver/client/overview.html + + + * **Ubuntu-24.04 Server:** + * Make sure prerequisites to add repository access are available. + ``` + sudo apt update + sudo apt install -y gpg-agent wget + ``` + * Add the online network package repository. + ``` + . /etc/os-release + if [[ ! " jammy noble " =~ " ${VERSION_CODENAME} " ]]; then + echo "Ubuntu version ${VERSION_CODENAME} not supported" + else + wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \ + sudo gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg + echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu ${VERSION_CODENAME} unified" | \ + sudo tee /etc/apt/sources.list.d/intel-gpu-${VERSION_CODENAME}.list + sudo apt update + fi + ``` + * Install kernel and IntelĀ® XPU System Management Interface (XPU-SMI) packages on a bare metal system. Installation on the host is sufficient for hardware management and support of the runtimes in containers and bare metal. + ``` + sudo apt install -y \ + linux-headers-$(uname -r) \ + linux-modules-extra-$(uname -r) \ + flex bison \ + intel-fw-gpu intel-i915-dkms xpu-smi + sudo reboot + ``` + * Install packages responsible for computing and media runtimes. + ``` + sudo apt install -y \ + intel-opencl-icd libze-intel-gpu1 libze1 \ + intel-media-va-driver-non-free libmfx-gen1 libvpl2 \ + libegl-mesa0 libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \ + libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \ + mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo + ``` + * Install development packages. + ``` + sudo apt install -y \ + libigc-dev intel-igc-cm libigdfcl-dev libigfxcmrt-dev libze-dev + ``` + * List the group assigned ownership of the render nodes and the groups you are a member of.There are specific groups that users must be a part of to access certain functionalities of the GPU. The render group specifically allows access to GPU resources for rendering tasks without giving full access to display management or other potentially more sensitive operations. + ``` + stat -c "%G" /dev/dri/render* + groups ${USER} + ``` + * If you are not a member of the same group used by the DRM render nodes, add your user to the render node group. + ``` + sudo gpasswd -a ${USER} render + ``` + * Change the group ID of the current shell. + ``` + newgrp render + ``` + + * Existing extensions, such as the device-operator and gpu-plugin, will be updated. The Intel GPU device plugin for Kubernetes facilitates access to Intel discrete and integrated GPUs, registering resources like gpu.intel.com/i915 and gpu.intel.com/xe within a Kubernetes cluster + * **For EMT** : EMT should be build with 6.11* linux kernel. + * **For Ubuntu** : Ubuntu Server 24.04.2, equipped with the 6.11 kernel, has been validated with BMG GPU drivers. DEB packages are available as rolling updates in the official documentation. + * **NVIDIA** * A new extension will be created to configure and install the NVIDIA GPU Operator using its Helm chart. The NVIDIA GPU Operator automates the management of all NVIDIA software components required to provision GPUs in Kubernetes, including drivers, the Kubernetes device plugin, the NVIDIA Container Runtime, and monitoring tools. ![gpu-operator-extension](images/nvidia-gpu-operator-extension-package.png) @@ -103,7 +184,10 @@ The implementation is planned in two phases: 1. Does Ubuntu in-tree kernel 6.11++ support Battlemage B580 as well? - The last tested version was not working. Tweaks were required. Needs to be verified against latest version. -2. No requirement for iGPU & dGPU together at the moment + Yes Ubuntu desktop 24.04.2 and 24.10 has 6.11* kernel support Battlemage B580 . +2. Gpu driver are not enable in ubuntu 24.04 server edition eith 6.11* kernel. + + DEB packages are included in the rolling updates outlined in the official graphics driver installation documentation. +3. No requirement for iGPU & dGPU together at the moment diff --git a/design-proposals/images/BMG-driver.png b/design-proposals/images/BMG-driver.png new file mode 100644 index 000000000..da925d245 Binary files /dev/null and b/design-proposals/images/BMG-driver.png differ