A NodeWright package that extends the base tuned package with NVIDIA-specific performance profiles for GPU and DGX systems.
This package inherits from the base tuned package and adds pre-configured tuned profiles optimized for NVIDIA hardware. The profiles are organized by:
- Common base profiles: Foundational settings deployed to
/usr/lib/tuned/ - OS-specific workload profiles: Profiles that may vary by OS version
- Service profiles: Service-specific settings (eks, GCP, etc.)
The configmap uses an intent-based model where you specify what you want (intent + accelerator) rather than a specific profile name. The profile name nvidia-{accelerator}-{intent} is constructed automatically. When accelerator=generic, the self-contained nvidia-generic profile is used instead, providing safe baseline tuning for any NVIDIA GPU without requiring accelerator-specific or intent-specific configuration.
This package requires tuned >= 2.19. The following operating systems are supported:
| OS | Version | Status | Notes |
|---|---|---|---|
| Ubuntu | 22.04 (Jammy) | ✅ Tested | Uses a min of OS-specific and common profiles |
| Ubuntu | 24.04 (Noble) | ✅ Tested | Uses common profiles |
| Debian | 11 (Bullseye) | ❌ | Default tuned version is too old (2.15) |
| Debian | 12 (Bookworm) | Uses common profiles | |
| RHEL | 9 | Uses common profiles | |
| Other | Any | Falls back to os/common/ profiles (untested, requires tuned >= 2.19) |
- Tested OS versions: These have been validated with the package and use OS-specific profile configurations
- Fallback behavior: For untested OS versions, the package will automatically fall back to the
os/common/profiles. This fallback is untested and requires the system to have tuned >= 2.19 installed - Tuned version requirement: All systems must have tuned version 2.19 or later. Check your system's tuned version with
tuned --version - OS detection: The package automatically detects the OS from
/etc/os-releaseand selects the appropriate profiles
profiles/
├── common/ # Base profiles → /usr/lib/tuned/
│ ├── nvidia-base/
│ └── nvidia-acs-disable/
├── os/
│ ├── common/ # Default workload profiles (fallback for untested OS)
│ │ ├── nvidia-generic/ # Self-contained baseline (accelerator=generic)
│ │ ├── nvidia-h100-performance/
│ │ ├── nvidia-h100-inference/
│ │ ├── nvidia-h100-multiNodeTraining/
│ │ ├── nvidia-gb200-performance/
│ │ ├── nvidia-gb200-inference/
│ │ └── nvidia-gb200-multiNodeTraining/
│ ├── ubuntu/
│ │ ├── 22.04/ # Mix of symlinks and OS-specific overrides
│ │ └── 24.04/ # Symlinks to os/common/ (override when needed)
│ ├── debian/
│ │ ├── 11/ # Mix of symlinks and OS-specific overrides
│ │ └── 12/ # Symlinks to os/common/ (override when needed)
│ └── rhel/
│ └── 9/ # Symlinks to os/common/ (override when needed)
└── service/
├── common/ # Shared helpers copied into every service's final profile dir
│ ├── mac-address-policy.sh
│ └── bootloader.sh
├── eks/
│ ├── tuned.conf.template # Service template (include= added dynamically)
│ ├── script.sh # Sources common/mac-address-policy.sh, invokes common/bootloader.sh
│ ├── nvidia-h100-inference.conf # AWS-compatible inference override
│ └── nvidia-gb200-inference.conf
└── aks/
├── tuned.conf.template
├── script.sh # Sources common/mac-address-policy.sh, invokes common/bootloader.sh
└── nvidia-h100-inference.conf # AKS-compatible inference override (drops kernel-6.8 EEVDF sysctls)
Note: Profiles are stored in profiles/ (not root_dir/) to avoid polluting the host filesystem during package extraction. The prepare scripts explicitly copy profiles to the appropriate tuned directories.
-
Prepare stage:
prepare_nvidia_profiles.shruns:- Reads
intentandacceleratorfrom the configmap - Constructs the profile name as
nvidia-{accelerator}-{intent} - Deploys common base profiles to
/usr/lib/tuned/ - Detects OS from
/etc/os-release - Copies the appropriate OS-specific workload profiles to
/etc/tuned/ - If a
serviceis specified, creates service profile with dynamicinclude=pointing to the workload profile
- Reads
-
Config stage: The inherited
tunedpackage applies the configured profile
The profile name is built from the configmap fields:
nvidia-{accelerator}-{intent}
Examples:
accelerator |
intent |
Constructed Profile |
|---|---|---|
generic |
(ignored) | nvidia-generic |
h100 |
performance |
nvidia-h100-performance |
h100 |
inference |
nvidia-h100-inference |
h100 |
multiNodeTraining |
nvidia-h100-multiNodeTraining |
gb200 |
performance |
nvidia-gb200-performance |
gb200 |
inference |
nvidia-gb200-inference |
gb200 |
multiNodeTraining |
nvidia-gb200-multiNodeTraining |
When accelerator=generic, the nvidia-generic profile is selected directly. The intent and service fields are ignored. This profile is self-contained (no include chain) and provides universally safe GPU tuning suitable for any NVIDIA GPU.
When you specify intent: inference, accelerator: h100, and service: eks:
eks (active profile)
└── includes: nvidia-h100-inference
└── includes: nvidia-h100-performance
└── includes: nvidia-acs-disable
└── includes: nvidia-base
Generic tuning (any NVIDIA GPU, no accelerator-specific or intent-specific config):
apiVersion: skyhook.nvidia.com/v1alpha1
kind: Skyhook
metadata:
name: nvidia-tuned-generic
spec:
nodeSelectors:
matchLabels:
nvidia.com/gpu.present: "true"
packages:
nvidia-tuned:
image: ghcr.io/nvidia/skyhook-packages/nvidia-tuned
version: 0.3.0
interrupt:
type: reboot
env:
- name: INTERRUPT
value: "true"
configMap:
accelerator: genericAccelerator-specific tuning (with intent and service):
apiVersion: skyhook.nvidia.com/v1alpha1
kind: Skyhook
metadata:
name: nvidia-tuned-eks
spec:
nodeSelectors:
matchLabels:
nvidia.com/dgx: "true"
packages:
nvidia-tuned:
image: ghcr.io/nvidia/skyhook-packages/nvidia-tuned
version: 0.3.0
interrupt:
type: reboot
configInterrupts:
intent:
type: reboot
env:
- name: INTERRUPT
value: "true"
configMap:
intent: inference
accelerator: h100
service: eksAKS tuning (H100 on Azure Kubernetes Service, Ubuntu 24.04):
apiVersion: skyhook.nvidia.com/v1alpha1
kind: Skyhook
metadata:
name: nvidia-tuned-aks
spec:
nodeSelectors:
matchLabels:
nvidia.com/gpu.present: "true"
packages:
nvidia-tuned:
image: ghcr.io/nvidia/skyhook-packages/nvidia-tuned
version: 0.3.0
interrupt:
type: reboot
configInterrupts:
intent:
type: reboot
env:
- name: INTERRUPT
value: "true"
configMap:
intent: inference
accelerator: h100
service: aks| Field | Required | Default | Description |
|---|---|---|---|
accelerator |
Yes | — | GPU/accelerator type (e.g., h100, gb200, generic). When set to generic, intent and service are ignored |
intent |
No | performance |
Workload intent (e.g., inference, performance, multiNodeTraining). Ignored when accelerator=generic |
service |
No | — | Service name (e.g., eks). If specified, service profile wraps the workload profile. Ignored when accelerator=generic |
| Intent | Description |
|---|---|
performance |
General GPU performance optimization |
inference |
Optimized for inference workloads (CPU isolation, hugepages) |
multiNodeTraining |
Optimized for distributed training (network buffers, TCP tuning) |
| Accelerator | Description |
|---|---|
generic |
Baseline tuning for any NVIDIA GPU (self-contained, no intent/service required) |
h100 |
NVIDIA H100 GPU |
gb200 |
NVIDIA GB200 GPU |
| Service | Description |
|---|---|
eks |
eks-specific settings (MAC address policy for CNI) |
aks |
aks-specific settings (MAC address policy, grub.d bootloader workaround for Ubuntu) |
By default, OS version directories contain symlinks to os/common/. To add OS-specific settings:
- Remove the symlink:
rm profiles/os/ubuntu/24.04/nvidia-h100-inference - Create directory:
mkdir profiles/os/ubuntu/24.04/nvidia-h100-inference - Add custom
tuned.confwith OS-specific settings
After deployment, verify the profile is active:
# List available profiles (should include nvidia-* profiles)
tuned-adm list
# Check active profile
tuned-adm active
# Verify tuning is applied
tuned-adm verifyThis package inherits all functionality from the base tuned package:
- Multi-distribution support (Ubuntu/Debian, CentOS/RHEL/Amazon Linux)
- Custom profile deployment via configmaps
- Script deployment for complex tuning logic
- Full lifecycle management (install, configure, uninstall)
See the tuned package README for complete documentation on all features.
- Package Version: 0.3.0
- Base Package: tuned (latest via preprocess.sh)
- Schema Version: v1