feat: add Dockerfile and configuration for ROCm MPI training runtime …#650
Conversation
…with Python 3.12 and PyTorch 2.9.1
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request adds a new ROCm-based MPI training container runtime targeting PyTorch 2.9 with OpenMPI 4.1 on Python 3.12. Four new files configure the runtime: a Dockerfile.konflux that builds from ROCm 6.4 UBI9 with SSH, OpenMPI wrappers, and dependency management via micropipenv; two shell entrypoint scripts (uid_entrypoint.sh and mpirun_wrapper.sh) that register the container's random UID in /etc/passwd at startup; and a pyproject.toml declaring project dependencies for ML workloads including PyTorch, HuggingFace models, and training utilities. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Security findingsCWE-20: Improper Input Validation — mpirun_wrapper.sh validates HOME for colon, newline, and carriage return, but does not reject other passwd-format-breaking characters (spaces, forward slash in the UID field context, or unescaped special characters). An attacker controlling HOME could inject additional passwd entries. Validate HOME against a strict allowlist: CWE-427: Uncontrolled Search Path Element — The Dockerfile propagates LD_LIBRARY_PATH for SSH and OpenMPI sessions. Confirm that:
CWE-269: Improper Access Control / GID 0 Handling — File permissions set with Missing entrypoint error handling — uid_entrypoint.sh exits non-zero if /etc/passwd is non-writable, but does not log the specific error to stderr before exit. Add explicit error messages to aid troubleshooting without silently failing container startup. 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux`:
- Around line 3-96: Turn this single-stage Dockerfile into a multi-stage build:
create a builder stage (FROM ... AS builder) that runs all dnf installs,
micropipenv install (referencing requirements.txt and micropipenv install), pip
installs (s3fs==2026.1.0), creates /opt/app-root site-packages and the
/usr/lib64/openmpi symlinks and /var/run/sshd setup, then in the final minimal
runtime stage copy only the runtime artifacts needed (copy /opt/app-root/bin,
/opt/app-root/lib/python3.12/site-packages, the /usr/lib64/openmpi/bin and
/usr/lib64/openmpi/lib files or symlinks, and uid_entrypoint.sh) and re-create
minimal config (ENV PATH, ENV LD_LIBRARY_PATH, ENTRYPOINT
["/usr/local/bin/uid_entrypoint.sh"], USER 1001) while omitting package
managers, build caches, and dnf/pip tools to reduce attack surface; ensure files
referenced in scripts (uid_entrypoint.sh,
/home/mpiuser/.openmpi/mca-params.conf) are present in the final image and
preserve permissions (chmod +x uid_entrypoint.sh, chgrp/chmod on /home/mpiuser)
when copying.
- Around line 36-39: The Dockerfile currently disables SSH host verification by
setting StrictHostKeyChecking to no and UserKnownHostsFile to /dev/null in
/etc/ssh/ssh_config; revert this change and instead ensure host key verification
remains enabled (remove the sed and echo lines that set StrictHostKeyChecking
and UserKnownHostsFile), and implement a secure alternative: populate
/etc/ssh/ssh_known_hosts or the user's ~/.ssh/known_hosts with the cluster node
public host keys at build/runtime (or use StrictHostKeyChecking accept-new if
you must auto-add on first connect), keep the SSH_PORT echo line only if needed,
and do not suppress SendEnv unless intentional; reference the modified RUN
command that edits /etc/ssh/ssh_config and the SSH_PORT environment variable
when making these changes.
- Around line 83-84: Remove the RUN chmod g=u /etc/passwd line from the
Dockerfile.konflux to avoid making /etc/passwd writable; instead update the
uid_entrypoint.sh invoked at container start to create a temp passwd file (copy
/etc/passwd, append the generated UID entry there) and export NSS_WRAPPER_PASSWD
pointing to that temp file and NSS_WRAPPER_GROUP=/etc/group so identity
resolution uses the temp copy without altering the real /etc/passwd; ensure any
Dockerfile changes remove the chmod and that uid_entrypoint.sh performs the safe
copy/append and exports NSS_WRAPPER_* before launching the main process.
- Around line 67-76: The Dockerfile is copying a non-existent requirements.txt;
replace that COPY and the subsequent RUN which expects requirements.txt with
operations that use Pipfile.lock (the hermetic source for micropipenv).
Specifically, change the COPY from "requirements.txt" to "Pipfile.lock" and
update the RUN that calls micropipenv (in the block that currently runs
micropipenv install && rm -f ./requirements.txt && pip install ... s3fs) to
install from the copied Pipfile.lock (invoke micropipenv install against the
lock file or use the --deploy flag), remove the rm -f ./requirements.txt step,
and keep the pip install --no-cache-dir --no-deps s3fs==2026.1.0 line; reference
the COPY and RUN lines around micropipenv, Pipfile.lock, and the s3fs install to
locate where to change.
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Pipfile`:
- Around line 8-39: The build fails because Dockerfile.konflux expects a
requirements.txt (the Dockerfile.konflux line with COPY requirements.txt ./) but
none is committed; fix by either (A) generating a requirements.txt from the
committed Pipfile.lock (e.g., pipenv lock -r or micropipenv lock --requirements)
and commit that requirements.txt so COPY requirements.txt ./ succeeds, or (B)
modify Dockerfile.konflux to install from Pipfile.lock directly using
pipenv/micropipenv (e.g., micropipenv install --deploy --requirement
Pipfile.lock or pip install -r <(pipenv lock -r)) so it no longer references
requirements.txt; update CI/Tekton to use the same approach and ensure
Pipfile.lock remains the authoritative lockfile.
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/uid_entrypoint.sh`:
- Around line 10-11: Sanitize the HOME value before writing /etc/passwd in
uid_entrypoint.sh: validate and reject or sanitize ${HOME} (the home_dir
variable) so it contains no colon ':' or newline characters (or other
passwd-delimiting bytes), falling back to a safe default (/home/mpiuser) if
validation fails; then use the sanitized variable when calling printf to append
the mpiuser entry (referencing home_dir and uid) so no untrusted delimiter or
newline can inject additional passwd records.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 8d074fcd-4519-4e13-8a39-71cce4d4dbd0
⛔ Files ignored due to path filters (1)
images/runtime/training/py312-rocm64-torch29-openmpi41/Pipfile.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konfluximages/runtime/training/py312-rocm64-torch29-openmpi41/Pipfileimages/runtime/training/py312-rocm64-torch29-openmpi41/uid_entrypoint.sh
- Added `mpirun_wrapper.sh` to register OpenShift random UID in `/etc/passwd` for proper user identification. - Updated Dockerfile to include the `mpirun_wrapper.sh` and adjusted symlink for `mpirun`. - Changed installation method for Python dependencies from `requirements.txt` to `Pipfile.lock` for better dependency management. - Modified environment variables to prioritize `/usr/local/bin` in the PATH for the new wrapper script.
…ed-workloads into mpi-rocm-image
|
/lgtm |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux (1)
48-51: Redundantsshd_configentries — sed already sets values, then echo appends duplicates.Lines 48-49 use
sedto uncomment and setStrictModesandPort, then lines 50-51 append the same directives again. This creates duplicate config entries.🔧 Proposed fix
RUN sed -i "s/#\(StrictModes \).*/\1no/g" /etc/ssh/sshd_config \ && sed -i "s/#\(Port \).*/\1${SSH_PORT}/g" /etc/ssh/sshd_config \ - && echo "StrictModes no" >> /etc/ssh/sshd_config \ - && echo "Port ${SSH_PORT}" >> /etc/ssh/sshd_config + && echo "AcceptEnv PATH LD_LIBRARY_PATH" >> /etc/ssh/sshd_configNote: The
AcceptEnvdirective is needed to complementSendEnvfrom line 45, enabling environment propagation for MPI processes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux` around lines 48 - 51, The Dockerfile edits to /etc/ssh/sshd_config currently run sed to set/uncomment StrictModes and Port and then append the same directives, causing duplicates; remove the redundant echo lines that append "StrictModes no" and "Port ${SSH_PORT}" so only the sed changes remain, and ensure an AcceptEnv directive is present (e.g., add or uncomment "AcceptEnv ..." to complement the existing SendEnv) so MPI environment variables propagate for OpenMPI processes; locate occurrences referencing StrictModes, Port, sed, and sshd_config in the Dockerfile.konflux to update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux`:
- Line 102: The ENV line sets LD_LIBRARY_PATH with a trailing colon when the
base LD_LIBRARY_PATH is unset, causing an unsafe addition of the current
directory; update the ENV assignment for LD_LIBRARY_PATH (the line with ENV
LD_LIBRARY_PATH) to append the existing value only when it is non-empty using
parameter expansion (e.g., use ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}) so the
resulting value is /usr/lib64/openmpi/lib when unset and
/usr/lib64/openmpi/lib:<old> when set.
---
Nitpick comments:
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux`:
- Around line 48-51: The Dockerfile edits to /etc/ssh/sshd_config currently run
sed to set/uncomment StrictModes and Port and then append the same directives,
causing duplicates; remove the redundant echo lines that append "StrictModes no"
and "Port ${SSH_PORT}" so only the sed changes remain, and ensure an AcceptEnv
directive is present (e.g., add or uncomment "AcceptEnv ..." to complement the
existing SendEnv) so MPI environment variables propagate for OpenMPI processes;
locate occurrences referencing StrictModes, Port, sed, and sshd_config in the
Dockerfile.konflux to update.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: f44cfaea-51fd-4d27-8d6b-04355eb792ff
⛔ Files ignored due to path filters (1)
images/runtime/training/py312-rocm64-torch29-openmpi41/Pipfile.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konfluximages/runtime/training/py312-rocm64-torch29-openmpi41/mpirun_wrapper.shimages/runtime/training/py312-rocm64-torch29-openmpi41/uid_entrypoint.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- images/runtime/training/py312-rocm64-torch29-openmpi41/uid_entrypoint.sh
- Removed outdated comments from Dockerfile related to OpenMPI configuration. - Added new dependencies `causal-conv1d` and `mamba-ssm` to Pipfile for enhanced functionality.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux (3)
86-92:⚠️ Potential issue | 🟠 MajorKeep
/etc/passwdimmutable.
chmod g=u /etc/passwdturns every GID 0 process into an account-database writer (CWE-732). With the current UID-registration flow, a compromised training process can poison identity resolution for later SSH or Python helper calls.🔧 Remediation
-# Allow uid_entrypoint to add random UID to /etc/passwd at runtime -RUN chmod g=u /etc/passwd +RUN dnf install -y nss_wrapper && dnf clean all +ENV LD_PRELOAD=/usr/lib64/libnss_wrapper.soThen update both runtime UID-registration scripts to use a temporary passwd file instead of appending to
/etc/passwd:passwd_file="$(mktemp)" cp /etc/passwd "$passwd_file" printf 'mpiuser:x:%s:0:mpiuser:%s:/bin/sh\n' "$(id -u)" "${HOME:-/home/mpiuser}" >> "$passwd_file" export NSS_WRAPPER_PASSWD="$passwd_file" export NSS_WRAPPER_GROUP=/etc/groupRun this to confirm the current scripts still append directly to
/etc/passwdand do not useNSS_WRAPPER_*:#!/bin/bash set -euo pipefail fd -HI '^(uid_entrypoint|mpirun_wrapper)\.sh$' images/runtime/training \ -x sh -c ' echo "=== $1 ===" rg -n -C2 "/etc/passwd|NSS_WRAPPER" "$1" || true echo ' sh {}Expected result: the current scripts will show direct
/etc/passwdwrites and noNSS_WRAPPER_PASSWD/NSS_WRAPPER_GROUPexports.As per coding guidelines
**: REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux` around lines 86 - 92, The Dockerfile currently makes /etc/passwd group-writable (RUN chmod g=u /etc/passwd) which is insecure; remove that RUN line and change the uid registration flow in uid_entrypoint.sh (and any mpirun_wrapper.sh) so they do not append to /etc/passwd but instead create a temporary passwd file (mktemp), copy /etc/passwd into it, append the runtime UID entry there, and export NSS_WRAPPER_PASSWD pointing to that temp file and NSS_WRAPPER_GROUP=/etc/group before starting the process; update references to uid_entrypoint.sh in the Dockerfile (COPY/CHMOD steps remain) and ensure no code appends directly to /etc/passwd anywhere in the image.
2-99: 🛠️ Refactor suggestion | 🟠 MajorSplit the builder from the runtime image.
This single-stage build leaves
dnf,pip, and build-time setup artifacts in the runtime layer, which expands the post-compromise surface and violates the repo’s Dockerfile hardening rule.🔧 Refactor skeleton
-FROM quay.io/aipcc/base-images/rocm-6.4-el9.6:3.4.0-ea.1-1772615238 +FROM quay.io/aipcc/base-images/rocm-6.4-el9.6:3.4.0-ea.1-1772615238 AS builder ... -RUN pip install --no-cache-dir -U "micropipenv[toml]" +RUN pip install --no-cache-dir -U "micropipenv[toml]" ... +FROM quay.io/aipcc/base-images/rocm-6.4-el9.6:3.4.0-ea.1-1772615238 AS runtime +COPY --from=builder /opt/app-root /opt/app-root +COPY --from=builder /usr/local/bin/mpirun /usr/local/bin/mpirun +COPY --from=builder /usr/local/bin/python /usr/local/bin/python +COPY --from=builder /usr/local/bin/uid_entrypoint.sh /usr/local/bin/uid_entrypoint.shAs per coding guidelines
**/Dockerfile*: DOCKERFILE SECURITY (CWE-798, CWE-250): 4. Use multi-stage builds.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux` around lines 2 - 99, The Dockerfile is a single-stage image that leaves dnf, pip and build-time artifacts in the runtime layer; convert it into a multi-stage build by adding a builder stage (e.g., FROM ... AS builder) that performs all RUN dnf install ..., pip install --no-cache-dir -U "micropipenv[toml]", micropipenv install, and any builds, and then copy only the runtime artifacts into a slim runtime stage (the original base image or a minimal UBI9) — specifically copy the installed Python site-packages (from /opt/app-root/lib/python3.12/site-packages), the mpirun_wrapper.sh (/usr/local/bin/mpirun), uid_entrypoint.sh (/usr/local/bin/uid_entrypoint.sh), the /usr/local/bin/python wrapper, and any required OpenMPI libs or symlinks into the final image; ensure final stage does not run dnf/pip, preserves ENTRYPOINT ["/usr/local/bin/uid_entrypoint.sh"], sets HOME/PATH/LD_LIBRARY_PATH and file permissions (chgrp/chmod for /home/mpiuser and /etc/passwd) as needed, and removes package manager artifacts so no build tools remain in the runtime layer.
41-44:⚠️ Potential issue | 🔴 CriticalRe-enable SSH host verification.
StrictHostKeyChecking noplusUserKnownHostsFile /dev/nullremoves peer authentication (CWE-295). On a shared cluster network, a spoofed launcher or worker can impersonate a peer and receive MPI-initiated remote commands.🔧 Remediation
-RUN sed -i "s/[ #]\(.*StrictHostKeyChecking \).*/ \1no/g" /etc/ssh/ssh_config \ - && echo " UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config \ +RUN install -d -m 700 /home/mpiuser/.ssh \ + && touch /home/mpiuser/.ssh/known_hosts \ + && chmod 600 /home/mpiuser/.ssh/known_hosts \ + && sed -i "s/[ #]\(.*StrictHostKeyChecking \).*/ \1accept-new/g" /etc/ssh/ssh_config \ + && echo " UserKnownHostsFile /home/mpiuser/.ssh/known_hosts" >> /etc/ssh/ssh_config \ && echo " Port ${SSH_PORT}" >> /etc/ssh/ssh_config \ && echo " SendEnv PATH LD_LIBRARY_PATH" >> /etc/ssh/ssh_configAs per coding guidelines
**: REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux` around lines 41 - 44, The Dockerfile step is disabling SSH host verification by setting "StrictHostKeyChecking no" and "UserKnownHostsFile /dev/null" (and appending SSH config lines using SSH_PORT and SendEnv); revert this by removing or not writing those insecure settings and instead allow the default host verification (or use "StrictHostKeyChecking accept-new" if automated first-time acceptance is required), and ensure known_hosts is provisioned into the image at build or runtime; update the RUN block that edits /etc/ssh/ssh_config (the sed and echo lines referencing StrictHostKeyChecking, UserKnownHostsFile, SSH_PORT, and SendEnv) accordingly so peer authentication is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux`:
- Around line 77-79: The Dockerfile currently installs s3fs with pip using
--no-deps which bypasses dependency resolution and prevents
aiobotocore/botocore/aioitertools from being locked; instead add s3fs==2026.1.0
to the project's Pipfile, remove the `pip install --no-cache-dir --no-deps
s3fs==2026.1.0` line from Dockerfile.konflux (the `pip install` invocation),
regenerate the lockfile with micropipenv/pipenv (e.g., run micropipenv lock or
the project's lock command) so Pipfile.lock contains s3fs and its transitive
deps, and then keep the Dockerfile step that runs `micropipenv install` so the
locked deps are installed.
---
Duplicate comments:
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux`:
- Around line 86-92: The Dockerfile currently makes /etc/passwd group-writable
(RUN chmod g=u /etc/passwd) which is insecure; remove that RUN line and change
the uid registration flow in uid_entrypoint.sh (and any mpirun_wrapper.sh) so
they do not append to /etc/passwd but instead create a temporary passwd file
(mktemp), copy /etc/passwd into it, append the runtime UID entry there, and
export NSS_WRAPPER_PASSWD pointing to that temp file and
NSS_WRAPPER_GROUP=/etc/group before starting the process; update references to
uid_entrypoint.sh in the Dockerfile (COPY/CHMOD steps remain) and ensure no code
appends directly to /etc/passwd anywhere in the image.
- Around line 2-99: The Dockerfile is a single-stage image that leaves dnf, pip
and build-time artifacts in the runtime layer; convert it into a multi-stage
build by adding a builder stage (e.g., FROM ... AS builder) that performs all
RUN dnf install ..., pip install --no-cache-dir -U "micropipenv[toml]",
micropipenv install, and any builds, and then copy only the runtime artifacts
into a slim runtime stage (the original base image or a minimal UBI9) —
specifically copy the installed Python site-packages (from
/opt/app-root/lib/python3.12/site-packages), the mpirun_wrapper.sh
(/usr/local/bin/mpirun), uid_entrypoint.sh (/usr/local/bin/uid_entrypoint.sh),
the /usr/local/bin/python wrapper, and any required OpenMPI libs or symlinks
into the final image; ensure final stage does not run dnf/pip, preserves
ENTRYPOINT ["/usr/local/bin/uid_entrypoint.sh"], sets HOME/PATH/LD_LIBRARY_PATH
and file permissions (chgrp/chmod for /home/mpiuser and /etc/passwd) as needed,
and removes package manager artifacts so no build tools remain in the runtime
layer.
- Around line 41-44: The Dockerfile step is disabling SSH host verification by
setting "StrictHostKeyChecking no" and "UserKnownHostsFile /dev/null" (and
appending SSH config lines using SSH_PORT and SendEnv); revert this by removing
or not writing those insecure settings and instead allow the default host
verification (or use "StrictHostKeyChecking accept-new" if automated first-time
acceptance is required), and ensure known_hosts is provisioned into the image at
build or runtime; update the RUN block that edits /etc/ssh/ssh_config (the sed
and echo lines referencing StrictHostKeyChecking, UserKnownHostsFile, SSH_PORT,
and SendEnv) accordingly so peer authentication is preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 21bba76e-cbeb-4c92-8be4-56360fe8da24
📒 Files selected for processing (2)
images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konfluximages/runtime/training/py312-rocm64-torch29-openmpi41/Pipfile
…MPI training runtime - Updated Dockerfile comments for clarity and accuracy regarding ROCm and OpenMPI configurations. - Replaced Pipfile with requirements.txt for Python dependencies, generated from pyproject.toml. - Added new pyproject.toml for better dependency management and streamlined installation process. - Removed outdated Pipfile and Pipfile.lock to eliminate redundancy.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux (1)
47-50:StrictModes noapplied twice — redundant.Line 47 uses
sedto setStrictModes no, and line 49 appendsStrictModes noagain. The echo is superfluous.Proposed fix
RUN sed -i "s/#\(StrictModes \).*/\1no/g" /etc/ssh/sshd_config \ && sed -i "s/#\(Port \).*/\1${SSH_PORT}/g" /etc/ssh/sshd_config \ - && echo "StrictModes no" >> /etc/ssh/sshd_config \ && echo "Port ${SSH_PORT}" >> /etc/ssh/sshd_config🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux` around lines 47 - 50, The Dockerfile sets StrictModes twice in /etc/ssh/sshd_config: the sed invocation sed -i "s/#\(StrictModes \).*/\1no/g" already enables StrictModes no, so remove the redundant echo "StrictModes no" append (and related duplicate echo of Port if present) to avoid duplicate entries; keep the sed lines (sed -i "s/#\(StrictModes \).*/\1no/g" and sed -i "s/#\(Port \).*/\1${SSH_PORT}/g") and only append values for keys that sed does not handle.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/pyproject.toml`:
- Line 61: pyproject.toml pins s3fs==2025.9.0 but Dockerfile.konflux separately
installs s3fs==2026.1.0 with pip install --no-deps, causing inconsistent
lockfile and missing transitive deps; fix by making versions consistent: either
update pyproject.toml to s3fs==2026.1.0 and regenerate requirements.txt
(preferred) or remove the explicit pip install --no-deps from Dockerfile.konflux
so the build uses the lockfile-managed s3fs, and ensure requirements.txt is
regenerated after any version change.
- Around line 80-81: The pyproject.toml currently sets index-url and also
configures extra-index-url pointing to https://pypi.org/simple which allows
fallback to an unvetted source; remove the extra-index-url entry (or replace it
with a secure internal fallback) so the installer uses only the vetted AIPCC
index, or if the fallback is required, add a documented security justification
and an approval tag in the file explaining why PyPI is permitted; locate the
extra-index-url key in pyproject.toml and either delete that line or replace it
with the documented justification and approval metadata.
---
Nitpick comments:
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konflux`:
- Around line 47-50: The Dockerfile sets StrictModes twice in
/etc/ssh/sshd_config: the sed invocation sed -i "s/#\(StrictModes \).*/\1no/g"
already enables StrictModes no, so remove the redundant echo "StrictModes no"
append (and related duplicate echo of Port if present) to avoid duplicate
entries; keep the sed lines (sed -i "s/#\(StrictModes \).*/\1no/g" and sed -i
"s/#\(Port \).*/\1${SSH_PORT}/g") and only append values for keys that sed does
not handle.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: d2e1c3fb-4dd7-46c3-af60-aabfe126c250
📒 Files selected for processing (3)
images/runtime/training/py312-rocm64-torch29-openmpi41/Dockerfile.konfluximages/runtime/training/py312-rocm64-torch29-openmpi41/pyproject.tomlimages/runtime/training/py312-rocm64-torch29-openmpi41/requirements.txt
| "mlflow>=3.8.1", | ||
|
|
||
| # Storage | ||
| "s3fs==2025.9.0", |
There was a problem hiding this comment.
s3fs version mismatch: pyproject.toml declares 2025.9.0, Dockerfile installs 2026.1.0.
pyproject.toml specifies s3fs==2025.9.0 but Dockerfile.konflux line 85 installs s3fs==2026.1.0 via a separate pip install --no-deps call. This creates two problems:
- The lockfile (
requirements.txt) will contain the wrong version. --no-depsbypasses dependency resolution, risking missing transitive deps (aiobotocore, botocore, etc.).
Either align the version here and regenerate requirements.txt (preferred), or remove the separate pip install in the Dockerfile.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@images/runtime/training/py312-rocm64-torch29-openmpi41/pyproject.toml` at
line 61, pyproject.toml pins s3fs==2025.9.0 but Dockerfile.konflux separately
installs s3fs==2026.1.0 with pip install --no-deps, causing inconsistent
lockfile and missing transitive deps; fix by making versions consistent: either
update pyproject.toml to s3fs==2026.1.0 and regenerate requirements.txt
(preferred) or remove the explicit pip install --no-deps from Dockerfile.konflux
so the build uses the lockfile-managed s3fs, and ensure requirements.txt is
regenerated after any version change.
…toml for security and future package management considerations
…s.txt and pyproject.toml
…ments.txt Co-authored-by: Karel Suta <krlsuta@gmail.com>
…ct.toml Co-authored-by: Karel Suta <krlsuta@gmail.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sutaakar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit
New Features
Chores