Skip to content

Commit f0ddb2a

Browse files
committed
feat(nvidia-tuned): update service to be eks to match aicr
1 parent b70d7d2 commit f0ddb2a

File tree

8 files changed

+36
-36
lines changed

8 files changed

+36
-36
lines changed

nvidia-tuned/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This package inherits from the base `tuned` package and adds pre-configured tune
88

99
- **Common base profiles**: Foundational settings deployed to `/usr/lib/tuned/`
1010
- **OS-specific workload profiles**: Profiles that may vary by OS version
11-
- **Service profiles**: Service-specific settings (AWS, GCP, etc.)
11+
- **Service profiles**: Service-specific settings (eks, GCP, etc.)
1212

1313
The configmap uses an **intent-based** model where you specify **what** you want (intent + accelerator) rather than a specific profile name. The profile name `nvidia-{accelerator}-{intent}` is constructed automatically.
1414

@@ -56,7 +56,7 @@ profiles/
5656
│ └── rhel/
5757
│ └── 9/ # Symlinks to os/common/ (override when needed)
5858
└── service/
59-
└── aws/
59+
└── eks/
6060
├── tuned.conf.template # Service template (include= added dynamically)
6161
└── script.sh
6262
```
@@ -95,10 +95,10 @@ Examples:
9595

9696
### Inheritance Chain
9797

98-
When you specify `intent: inference`, `accelerator: h100`, and `service: aws`:
98+
When you specify `intent: inference`, `accelerator: h100`, and `service: eks`:
9999

100100
```
101-
aws (active profile)
101+
eks (active profile)
102102
└── includes: nvidia-h100-inference
103103
└── includes: nvidia-h100-performance
104104
└── includes: nvidia-acs-disable
@@ -111,7 +111,7 @@ aws (active profile)
111111
apiVersion: skyhook.nvidia.com/v1alpha1
112112
kind: Skyhook
113113
metadata:
114-
name: nvidia-tuned-aws
114+
name: nvidia-tuned-eks
115115
spec:
116116
nodeSelectors:
117117
matchLabels:
@@ -131,7 +131,7 @@ spec:
131131
configMap:
132132
intent: inference
133133
accelerator: h100
134-
service: aws
134+
service: eks
135135
```
136136
137137
### ConfigMap Fields
@@ -140,7 +140,7 @@ spec:
140140
|-------|----------|---------|-------------|
141141
| `accelerator` | Yes | — | GPU/accelerator type (e.g., `h100`) |
142142
| `intent` | No | `performance` | Workload intent (e.g., `inference`, `performance`, `multiNodeTraining`) |
143-
| `service` | No | — | Service name (e.g., `aws`). If specified, service profile wraps the workload profile |
143+
| `service` | No | — | Service name (e.g., `eks`). If specified, service profile wraps the workload profile |
144144

145145
## Available Profiles
146146

@@ -163,7 +163,7 @@ spec:
163163

164164
| Service | Description |
165165
|---------|-------------|
166-
| `aws` | AWS-specific settings (MAC address policy for CNI) |
166+
| `eks` | eks-specific settings (MAC address policy for CNI) |
167167

168168
## Adding OS-Specific Overrides
169169

nvidia-tuned/config.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"schema_version": "v1",
33
"package_name": "nvidia_tuned",
4-
"package_version": "0.2.0",
4+
"package_version": "0.2.3",
55
"expected_config_files": ["accelerator"],
66
"modes": {
77
"uninstall": [
File renamed without changes.

nvidia-tuned/profiles/service/aws/nvidia-gb200-inference.conf renamed to nvidia-tuned/profiles/service/eks/nvidia-gb200-inference.conf

File renamed without changes.

nvidia-tuned/profiles/service/aws/nvidia-h100-inference.conf renamed to nvidia-tuned/profiles/service/eks/nvidia-h100-inference.conf

File renamed without changes.

nvidia-tuned/profiles/service/aws/script.sh renamed to nvidia-tuned/profiles/service/eks/script.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ EXPECTED_NETWORK_CONTENT='[Link]
1414
MACAddressPolicy=none
1515
'
1616

17-
# Profile dir (script is in e.g. /etc/tuned/aws-{accelerator}-{intent}/)
17+
# Profile dir (script is in e.g. /etc/tuned/eks-{accelerator}-{intent}/)
1818
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
1919

2020
apply_network_dropin() {
File renamed without changes.

tests/integration/nvidia_tuned/test_prepare_nvidia_profiles.py

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
- prepare_nvidia_profiles does the right thing for all combinations of:
88
- accelerator (h100, gb200)
99
- intent (performance, inference, multiNodeTraining)
10-
- service (aws, none)
10+
- service (eks, none)
1111
- For AWS service, verifies grub config file is created correctly
1212
"""
1313

@@ -261,14 +261,14 @@ def test_prepare_nvidia_profiles_no_service(base_image, accelerator, intent):
261261

262262
@pytest.mark.parametrize("accelerator", ["h100", "gb200"])
263263
@pytest.mark.parametrize("intent", ["performance", "inference", "multiNodeTraining"])
264-
def test_prepare_nvidia_profiles_with_aws_service(base_image, accelerator, intent):
265-
"""Test prepare_nvidia_profiles with AWS service for all combinations."""
264+
def test_prepare_nvidia_profiles_with_eks_service(base_image, accelerator, intent):
265+
"""Test prepare_nvidia_profiles with EKS service for all combinations."""
266266
runner = DockerTestRunner(package="nvidia-tuned", base_image=base_image)
267267
try:
268268
configmaps = {
269269
"accelerator": accelerator,
270270
"intent": intent,
271-
"service": "aws",
271+
"service": "eks",
272272
}
273273

274274
# Create container by running script (this creates the container)
@@ -296,21 +296,21 @@ def test_prepare_nvidia_profiles_with_aws_service(base_image, accelerator, inten
296296

297297
# Final profile name = {service}-{accelerator}-{intent}
298298
expected_workload_profile = f"nvidia-{accelerator}-{intent}"
299-
expected_final_profile = f"aws-{accelerator}-{intent}"
300-
assert_output_contains(result.stdout, "Requested service: aws")
299+
expected_final_profile = f"eks-{accelerator}-{intent}"
300+
assert_output_contains(result.stdout, "Requested service: eks")
301301
assert_output_contains(result.stdout, f"include={expected_workload_profile}")
302302
assert_output_contains(result.stdout, f"Final profile name: {expected_final_profile}")
303303

304-
# Verify service profile directory exists (final name = aws-{accelerator}-{intent})
304+
# Verify service profile directory exists (final name = eks-{accelerator}-{intent})
305305
service_profile_exists = runner.file_exists(f"/etc/tuned/{expected_final_profile}/tuned.conf")
306-
assert service_profile_exists, f"AWS service profile {expected_final_profile} was not deployed"
306+
assert service_profile_exists, f"EKS service profile {expected_final_profile} was not deployed"
307307

308308
# Verify service profile includes the workload profile
309309
service_profile_content = runner.get_file_contents(
310310
f"/etc/tuned/{expected_final_profile}/tuned.conf"
311311
)
312312
assert f"include={expected_workload_profile}" in service_profile_content, \
313-
f"AWS profile does not include {expected_workload_profile}"
313+
f"EKS profile does not include {expected_workload_profile}"
314314

315315
# Verify tuned_profile file points to final profile ({service}-{accelerator}-{intent})
316316
tuned_profile_content = runner.get_file_contents(
@@ -319,28 +319,28 @@ def test_prepare_nvidia_profiles_with_aws_service(base_image, accelerator, inten
319319
assert tuned_profile_content.strip() == expected_final_profile, \
320320
f"tuned_profile should be '{expected_final_profile}', got: {tuned_profile_content!r}"
321321

322-
# For AWS, verify bootloader script exists in final profile dir
322+
# For EKS, verify bootloader script exists in final profile dir
323323
bootloader_script_exists = runner.file_exists(
324324
f"/etc/tuned/{expected_final_profile}/bootloader.sh"
325325
)
326-
assert bootloader_script_exists, "AWS bootloader.sh script was not deployed"
326+
assert bootloader_script_exists, "EKS bootloader.sh script was not deployed"
327327

328328
# Verify script.sh exists in final profile dir
329329
script_exists = runner.file_exists(f"/etc/tuned/{expected_final_profile}/script.sh")
330-
assert script_exists, "AWS script.sh was not deployed"
330+
assert script_exists, "EKS script.sh was not deployed"
331331

332332
finally:
333333
runner.cleanup()
334334

335335

336-
def test_prepare_nvidia_profiles_aws_grub_config(base_image):
337-
"""Test that AWS service creates the correct grub config file."""
336+
def test_prepare_nvidia_profiles_eks_grub_config(base_image):
337+
"""Test that EKS service creates the correct grub config file."""
338338
runner = DockerTestRunner(package="nvidia-tuned", base_image=base_image)
339339
try:
340340
configmaps = {
341341
"accelerator": "h100",
342342
"intent": "inference",
343-
"service": "aws",
343+
"service": "eks",
344344
}
345345

346346
# Create container directly
@@ -371,9 +371,9 @@ def test_prepare_nvidia_profiles_aws_grub_config(base_image):
371371
assert "TUNED_BOOT_CMDLINE=\"iommu=pt hugepages=8192\"" in bootcmdline_content, \
372372
"Bootcmdline file should contain the actual boot parameters (iommu=pt hugepages=8192)"
373373

374-
# Final profile name = aws-h100-inference for this test's configmaps
375-
final_profile = "aws-h100-inference"
376-
# Run the AWS bootloader script (skip update-grub if it fails)
374+
# Final profile name = eks-h100-inference for this test's configmaps
375+
final_profile = "eks-h100-inference"
376+
# Run the EKS bootloader script (skip update-grub if it fails)
377377
bootloader_result = runner.container.exec_run(
378378
["bash", "-c", f"/etc/tuned/{final_profile}/bootloader.sh || true"],
379379
workdir="/"
@@ -443,14 +443,14 @@ def test_prepare_nvidia_profiles_missing_accelerator(base_image):
443443
runner.cleanup()
444444

445445

446-
def test_prepare_nvidia_profiles_aws_service_specific_profile(base_image):
447-
"""Test that AWS service-specific inference profiles are used when available."""
446+
def test_prepare_nvidia_profiles_eks_service_specific_profile(base_image):
447+
"""Test that EKS service-specific inference profiles are used when available."""
448448
runner = DockerTestRunner(package="nvidia-tuned", base_image=base_image)
449449
try:
450450
configmaps = {
451451
"accelerator": "h100",
452452
"intent": "inference",
453-
"service": "aws",
453+
"service": "eks",
454454
}
455455

456456
# Create container directly
@@ -464,29 +464,29 @@ def test_prepare_nvidia_profiles_aws_service_specific_profile(base_image):
464464

465465
assert_exit_code(result, 0)
466466

467-
# Verify that AWS-specific inference profile was deployed
467+
# Verify that EKS-specific inference profile was deployed
468468
# (it should overwrite the OS profile)
469469
inference_profile_content = runner.get_file_contents(
470470
"/etc/tuned/nvidia-h100-inference/tuned.conf"
471471
)
472472

473-
# AWS-specific profile should NOT have scheduler parameters set (they may be in comments)
473+
# EKS-specific profile should NOT have scheduler parameters set (they may be in comments)
474474
# Check that they're not set as actual sysctl parameters (not commented out)
475475
import re
476476

477477
# Check for uncommented kernel.sched_latency_ns= lines
478478
latency_pattern = r'^\s*kernel\.sched_latency_ns\s*='
479479
assert not re.search(latency_pattern, inference_profile_content, re.MULTILINE), \
480-
"AWS-specific inference profile should not contain uncommented kernel.sched_latency_ns"
480+
"EKS-specific inference profile should not contain uncommented kernel.sched_latency_ns"
481481

482482
# Check for uncommented kernel.sched_min_granularity_ns= lines
483483
granularity_pattern = r'^\s*kernel\.sched_min_granularity_ns\s*='
484484
assert not re.search(granularity_pattern, inference_profile_content, re.MULTILINE), \
485-
"AWS-specific inference profile should not contain uncommented kernel.sched_min_granularity_ns"
485+
"EKS-specific inference profile should not contain uncommented kernel.sched_min_granularity_ns"
486486

487487
# But should have vm.swappiness
488488
assert "vm.swappiness=1" in inference_profile_content, \
489-
"AWS-specific inference profile should contain vm.swappiness=1"
489+
"EKS-specific inference profile should contain vm.swappiness=1"
490490

491491
finally:
492492
runner.cleanup()

0 commit comments

Comments
 (0)