feat(gpu): add NVIDIA GRID v20 driver support for RTX PRO 6000 BSE v6 SKUs#8619
Open
ganeshkumarashok wants to merge 2 commits into
Open
feat(gpu): add NVIDIA GRID v20 driver support for RTX PRO 6000 BSE v6 SKUs#8619ganeshkumarashok wants to merge 2 commits into
ganeshkumarashok wants to merge 2 commits into
Conversation
… SKUs
Select the new aks-gpu-grid-v20 image (NVIDIA GRID 595.x) for
NC_RTXPRO6000BSE_v6 SKUs. All existing GRID SKUs continue to use
aks-gpu-grid (570.x); CUDA path is untouched.
- components.json: add aks-gpu-grid-v20 GPUContainerImages entry.
- gpu_components.go: parse it into NvidiaGridV20DriverVersion /
AKSGPUGridV20VersionSuffix; refactor LoadConfig to match on the exact
repo name (fixes a latent substring collision between aks-gpu-grid and
aks-gpu-grid-v20); add RTXPro6000GPUDriverSizes.
- baker.go: add useGridV20Drivers(); branch GetGPUDriverVersion /
GetAKSGPUImageSHA / GetGPUDriverType on it (checked before grid),
driver type "grid-v20".
- renovate.json: add aks/aks-gpu-grid-v20 package rule.
- tests for the new selection paths.
Scope is Ubuntu-only: RTX PRO 6000 BSE v6 runs on Ubuntu GPU nodes, which
build the driver image repo as aks-gpu-${GPU_DRIVER_TYPE}; non-Ubuntu
(Mariner/ACL) install paths do not use the container image and are
deliberately untouched.
NOTE (do not merge yet): aks-gpu-grid-v20 is not yet published to MCR, so
the version tag suffix in components.json is a placeholder and must be
replaced with the real published tag before merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new NVIDIA GRID v20 (595.x) driver selection path for RTX PRO 6000 Blackwell Server Edition v6 NC SKUs by introducing a new GPU driver container image (aks-gpu-grid-v20) and ensuring config parsing doesn’t confuse it with the existing aks-gpu-grid image.
Changes:
- Add
aks-gpu-grid-v20toGPUContainerImagesand parse it into new datamodel globals (version + suffix), using exact repo-name matching to avoid substring collisions. - Add SKU-based routing so RTX PRO 6000 BSE v6 sizes use GRID v20 for
GetGPUDriverVersion,GetAKSGPUImageSHA, andGetGPUDriverType(new type:grid-v20). - Extend Renovate rules and unit tests to cover the new config fields and selection paths.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
parts/common/components.json |
Adds new GPU container image entry for aks-gpu-grid-v20 (tag currently a placeholder per PR description). |
pkg/agent/datamodel/gpu_components.go |
Adds v20 config globals and exact repo-name parsing; introduces RTX PRO 6000 BSE v6 SKU map. |
pkg/agent/datamodel/gpu_components_test.go |
Validates v20 config values are populated and correctly formatted. |
pkg/agent/baker.go |
Routes RTX PRO 6000 BSE v6 SKUs to GRID v20 driver/version/type before standard GRID selection. |
pkg/agent/baker_test.go |
Adds unit tests for GRID v20 selection in version/type/image suffix. |
.github/renovate.json |
Adds Renovate package rule for aks/aks-gpu-grid-v20. |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds NVIDIA GRID v20 (595.x) driver support, selecting the new
aks-gpu-grid-v20container image for RTX PRO 6000 Blackwell Server Edition v6 SKUs:Standard_NC128ds_xl_RTXPRO6000BSE_v6Standard_NC256ds_xl_RTXPRO6000BSE_v6Standard_NC320ds_xl_RTXPRO6000BSE_v6All existing GRID SKUs keep using
aks-gpu-grid(570.x); the CUDA path is untouched.Changes
parts/common/components.json— addaks-gpu-grid-v20GPUContainerImagesentry.pkg/agent/datamodel/gpu_components.go— parse it intoNvidiaGridV20DriverVersion/AKSGPUGridV20VersionSuffix; refactorLoadConfigto match on the exact repo name (fixes a latent substring collision:aks-gpu-grid-v20containsaks-gpu-grid); addRTXPro6000GPUDriverSizes.pkg/agent/baker.go— adduseGridV20Drivers(); branchGetGPUDriverVersion/GetAKSGPUImageSHA/GetGPUDriverTypeon it (checked before grid); driver type string"grid-v20"..github/renovate.json— addaks/aks-gpu-grid-v20package rule.Design notes
On Ubuntu the driver image repo is built as
mcr.microsoft.com/aks/aks-gpu-${GPU_DRIVER_TYPE}(cse_helpers.sh), so setting the driver type togrid-v20resolves the new repo automatically.Scope is Ubuntu-only by design. RTX PRO 6000 BSE v6 runs on Ubuntu GPU nodes. The non-Ubuntu install paths (Mariner RPM / ACL sysext) do not use the container image and have no v20 packages, so those CSE checks are deliberately left unchanged.
The new image comes from aks-gpu PR #158 (merged).
aks-gpu-grid-v20is not yet published to MCR (onboarding tracked separately). The version tag suffix incomponents.json(595.58.03-20260101000000) is a placeholder and must be replaced with the real published tag before merge. Until then nodes would attempt to pull a nonexistent tag.make generateproduces no testdata/manifest diff (no existing scenario uses these SKUs), so the placeholder does not leak into generated snapshots.Testing
go build ./pkg/agent/...go test ./pkg/agent ./pkg/agent/datamodel— passmake validate-components— pass