Skip to content

Commit 3a974e7

Browse files
authored
Merge pull request #30 from NVIDIA/feat/nvidia-tuning-gke
feat: add nvidia-tuning-gke to support GKE container optimized OS
2 parents 6418087 + 673626a commit 3a974e7

File tree

15 files changed

+633
-0
lines changed

15 files changed

+633
-0
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,18 @@ A package that applies the same node setup steps as the dgxcloud_aws_eks VMI for
9191
- ConfigMap: `service` and `accelerator` only; versions baked in `defaults/*.conf`
9292
- No OFI, hardening, or system-node-settings; see [nvidia-setup README](./nvidia-setup/README.md)
9393

94+
### 6. NVIDIA Tuning GKE Package (`nvidia-tuning-gke/`)
95+
Extends the **tuning** package with baked-in H100 and GB200 configs for GKE Container Optimized OS. You supply only `accelerator` and `intent`; the package selects the matching sysctl (and optional containerd drop-in) and runs the base tuning apply. No grub—GKE nodes do not use grub. Note: this is a limited set from nvidia-tuned due to the limitations of the mainly read-only OS. For non COS GKE setups consider updating nvidia-tuned to support gke and use the base profiles.
96+
97+
**Capabilities:**
98+
- Sysctl and service drop-ins derived from [nvidia-tuned](./nvidia-tuned/)
99+
- ConfigMap: `accelerator` (h100, gb200) and `intent` (inference, multiNodeTraining)
100+
- Baked-in profiles under `profiles/{accelerator}/{intent}/`
101+
102+
**Key features:**
103+
- No manual sysctl.conf authoring; profile content is fixed in the image
104+
- See [nvidia-tuning-gke README](./nvidia-tuning-gke/README.md)
105+
94106
## Package Structure
95107

96108
Each package follows the standard skyhook package structure:

nvidia-tuning-gke/Dockerfile

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Extends the tuning package with baked-in H100/GB200 configs for GKE.
5+
# Config step: prepare_nvidia_configs.sh (populate configmaps from profile)
6+
# then update_settings.sh (base tuning apply).
7+
8+
ARG TUNING_VERSION=1.1.4
9+
FROM ghcr.io/nvidia/skyhook-packages/tuning:${TUNING_VERSION}
10+
11+
COPY profiles/ /skyhook-package/profiles/
12+
COPY skyhook_dir/prepare_nvidia_configs.sh /skyhook-package/skyhook_dir/
13+
COPY skyhook_dir/prepare_nvidia_configs_check.sh /skyhook-package/skyhook_dir/
14+
COPY config.json /skyhook-package/
15+
16+
RUN chmod +x /skyhook-package/skyhook_dir/prepare_nvidia_configs.sh \
17+
/skyhook-package/skyhook_dir/prepare_nvidia_configs_check.sh \
18+
/skyhook-package/skyhook_dir/*.sh

nvidia-tuning-gke/README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# NVIDIA Tuning GKE Package
2+
3+
A Skyhook package that extends the base **tuning** package with baked-in H100 and GB200 tuning configs for GKE. It mirrors the sysctl (and optional containerd drop-in) from the [nvidia-tuned](../nvidia-tuned/). **GRUB/kernel cmdline is not used**—GKE nodes do not use grub, so only sysctl and service drop-ins are applied. This package is required instead of the nvidia-tuned because Container Optimized OS does not include tuned and it cannot be installed.
4+
5+
## Overview
6+
7+
- **Inherits from:** [tuning](../tuning/) (same pattern as nvidia-tuned inheriting from tuned).
8+
- **ConfigMap:** You supply only `accelerator` and `intent`; the package fills in `sysctl.conf` and for GB200 `service_containerd.conf` from baked-in profiles, then runs the base tuning package to apply them.
9+
10+
## ConfigMap (required)
11+
12+
| Key | Values | Description |
13+
|---------------|---------------------|-------------|
14+
| `accelerator` | `h100`, `gb200` | GPU/accelerator type. |
15+
| `intent` | `inference`, `multiNodeTraining` | Workload intent. |
16+
17+
Profiles are selected by the pair `{accelerator}/{intent}` and live under `profiles/{accelerator}/{intent}/` (e.g. `profiles/h100/inference/`, `profiles/gb200/multiNodeTraining/`). The prepare step discovers available accelerators and intents from the filesystem, so new profiles can be added without changing the scripts.
18+
19+
## Interrupts
20+
21+
Use **restart_all_services** so sysctl changes take effect; DO NOT USE reboot interrupt as skyhook has to re-apply all changes every reboot and this will cause an infinite loop. Example:
22+
23+
```yaml
24+
packages:
25+
nvidia-tuning-gke:
26+
image: ghcr.io/nvidia/skyhook-packages/nvidia-tuning-gke
27+
version: 0.1.0
28+
interrupt:
29+
type: restart_all_services
30+
configMap:
31+
accelerator: gb200
32+
intent: inference
33+
```
34+
35+
## Baked-in profiles
36+
37+
Profiles are grouped by accelerator then intent: `profiles/{accelerator}/{intent}/`. Each profile directory contains `sysctl.conf` and optionally `service_containerd.conf`. No grub (GKE does not use grub). Content matches [tuning/examples/](../tuning/examples/) sysctl (and service_containerd for GB200):
38+
39+
- **profiles/h100/inference/** – Base ARP + sched (sysctl).
40+
- **profiles/h100/multiNodeTraining/** – Base ARP + net/tcp/bbr/fq (sysctl).
41+
- **profiles/gb200/inference/** – Base + gb200-perf + sched (sysctl); containerd LimitSTACK.
42+
- **profiles/gb200/multiNodeTraining/** – Base + gb200-perf + net/tcp (sysctl); containerd LimitSTACK.
43+
44+
Adding a new accelerator or intent is done by adding a new directory under `profiles/`; the prepare script discovers them at runtime.
45+
46+
## What is not applied
47+
48+
Due to Container Optimized OS the following limitations apply: no CPU governor, no kernel module loading, no dynamic `isolcpus` (add a concrete `isolcpus=` line to the profile and rebuild if needed).
49+
50+
## Version
51+
52+
- **Package version:** 0.1.0
53+
- **Base package:** tuning (1.1.4)
54+
- **Schema version:** v1

nvidia-tuning-gke/config.json

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
{
2+
"schema_version": "v1",
3+
"package_name": "nvidia_tuning_gke",
4+
"package_version": "0.1.0",
5+
"expected_config_files": ["accelerator", "intent"],
6+
"modes": {
7+
"config": [
8+
{
9+
"name": "prepare",
10+
"path": "prepare_nvidia_configs.sh",
11+
"arguments": [],
12+
"returncodes": [0],
13+
"on_host": true,
14+
"env": {},
15+
"idempotence": false,
16+
"upgrade_step": false
17+
},
18+
{
19+
"name": "config",
20+
"path": "update_settings.sh",
21+
"arguments": [],
22+
"returncodes": [0],
23+
"on_host": true,
24+
"env": {},
25+
"idempotence": true,
26+
"upgrade_step": false
27+
}
28+
],
29+
"config-check": [
30+
{
31+
"name": "prepare-check",
32+
"path": "prepare_nvidia_configs_check.sh",
33+
"arguments": [],
34+
"returncodes": [0],
35+
"on_host": true,
36+
"env": {},
37+
"idempotence": true,
38+
"upgrade_step": false
39+
},
40+
{
41+
"name": "config-check",
42+
"path": "update_settings_check.sh",
43+
"arguments": [],
44+
"returncodes": [0],
45+
"on_host": true,
46+
"env": {},
47+
"idempotence": true,
48+
"upgrade_step": false
49+
}
50+
],
51+
"post-interrupt-check": [
52+
{
53+
"name": "prepare",
54+
"path": "prepare_nvidia_configs.sh",
55+
"arguments": [],
56+
"returncodes": [0],
57+
"on_host": true,
58+
"env": {},
59+
"idempotence": false,
60+
"upgrade_step": false
61+
},
62+
{
63+
"name": "post-interrupt-check",
64+
"path": "update_settings_post_check.sh",
65+
"arguments": [],
66+
"returncodes": [0],
67+
"on_host": true,
68+
"env": {},
69+
"idempotence": true,
70+
"upgrade_step": false
71+
}
72+
],
73+
"uninstall": [
74+
{
75+
"name": "uninstall",
76+
"path": "update_settings_uninstall.sh",
77+
"arguments": [],
78+
"returncodes": [0],
79+
"on_host": true,
80+
"env": {},
81+
"idempotence": true,
82+
"upgrade_step": false
83+
}
84+
],
85+
"uninstall-check": [
86+
{
87+
"name": "uninstall-check",
88+
"path": "update_settings_uninstall_check.sh",
89+
"arguments": [],
90+
"returncodes": [0],
91+
"on_host": true,
92+
"env": {},
93+
"idempotence": true,
94+
"upgrade_step": false
95+
}
96+
]
97+
}
98+
}

nvidia-tuning-gke/preprocess.sh

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
#!/bin/bash
2+
3+
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
4+
# SPDX-License-Identifier: Apache-2.0
5+
#
6+
#
7+
# Licensed under the Apache License, Version 2.0 (the "License");
8+
# you may not use this file except in compliance with the License.
9+
# You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
19+
# Preprocess script for nvidia_tuned package
20+
# Fetches the most recent tag for the tuned package and outputs it as TUNED_VERSION
21+
#
22+
# This script outputs GitHub Actions environment variables in the format:
23+
# BUILD_ARGS=TUNED_VERSION=<version>
24+
#
25+
# Usage: ./preprocess.sh
26+
# Environment variables:
27+
# GITHUB_OUTPUT - If set, outputs are written to this file (GitHub Actions)
28+
29+
set -e
30+
31+
# Check if PACKAGE_VERSIONS is set
32+
if [ -z "${PACKAGE_VERSIONS:-}" ]; then
33+
echo "ERROR: PACKAGE_VERSIONS environment variable is not set"
34+
exit 1
35+
fi
36+
37+
# Extract the tuned version from the JSON
38+
latest_version=$(jq -r '.tuning' <<< "${PACKAGE_VERSIONS}")
39+
40+
# Check if the version was found
41+
if [ -z "${latest_version}" ] || [ "${latest_version}" = "null" ]; then
42+
echo "ERROR: Could not find 'tuning' package version in PACKAGE_VERSIONS: ${PACKAGE_VERSIONS}"
43+
exit 1
44+
fi
45+
46+
echo "Found tuning version: ${latest_version}"
47+
48+
# Output the build args
49+
# If running in GitHub Actions, write to GITHUB_OUTPUT
50+
if [ -n "${GITHUB_OUTPUT:-}" ]; then
51+
echo "BUILD_ARGS=TUNING_VERSION=${latest_version}" >> "$GITHUB_OUTPUT"
52+
else
53+
# For local testing, output to stdout
54+
echo "BUILD_ARGS=TUNING_VERSION=${latest_version}"
55+
fi
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[Service]
2+
LimitSTACK=67108864
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# GB200 inference – sysctl.
2+
net.ipv4.conf.all.arp_announce = 2
3+
net.ipv4.conf.default.arp_announce = 2
4+
net.ipv4.conf.all.arp_ignore = 1
5+
net.ipv4.conf.default.arp_ignore = 1
6+
fs.inotify.max_user_instances=65535
7+
fs.inotify.max_user_watches=524288
8+
kernel.threads-max=16512444
9+
vm.max_map_count=262144
10+
vm.min_free_kbytes=65536
11+
vm.overcommit_memory=1
12+
vm.swappiness=1
13+
kernel.sched_latency_ns=1000000
14+
kernel.sched_min_granularity_ns=100000
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[Service]
2+
LimitSTACK=67108864
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# GB200 multiNodeTraining – sysctl.
2+
net.ipv4.conf.all.arp_announce = 2
3+
net.ipv4.conf.default.arp_announce = 2
4+
net.ipv4.conf.all.arp_ignore = 1
5+
net.ipv4.conf.default.arp_ignore = 1
6+
fs.inotify.max_user_instances=65535
7+
fs.inotify.max_user_watches=524288
8+
kernel.threads-max=16512444
9+
vm.max_map_count=262144
10+
vm.min_free_kbytes=65536
11+
vm.overcommit_memory=1
12+
net.core.rmem_max=536870912
13+
net.core.wmem_max=536870912
14+
net.core.rmem_default=134217728
15+
net.core.wmem_default=134217728
16+
net.ipv4.tcp_rmem=4096 87380 268435456
17+
net.ipv4.tcp_wmem=4096 65536 268435456
18+
net.core.netdev_max_backlog=10000
19+
net.ipv4.tcp_max_syn_backlog=8192
20+
net.ipv4.tcp_congestion_control=bbr
21+
net.core.default_qdisc=fq
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# H100 inference – sysctl. Mirrors nvidia-base + nvidia-h100-inference [sysctl].
2+
net.ipv4.conf.all.arp_announce = 2
3+
net.ipv4.conf.default.arp_announce = 2
4+
net.ipv4.conf.all.arp_ignore = 1
5+
net.ipv4.conf.default.arp_ignore = 1
6+
vm.swappiness=1
7+
kernel.sched_latency_ns=1000000
8+
kernel.sched_min_granularity_ns=100000

0 commit comments

Comments
 (0)