fix: install rocm280 flash-attn outside lockfile on main#936
Conversation
Flash-attn build hooks can fail during micropipenv lockfile installation with torch import errors in this image path. Remove flash-attn from lockfile resolution and install it explicitly with no-build-isolation after base dependencies are installed. Co-authored-by: Cursor <cursoragent@cursor.com>
📝 WalkthroughWalkthroughA new build step is added to Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Supply chain / security flags — no praise, just findings: CWE-829 (Inclusion of Functionality from Untrusted Control Sphere) / CWE-494 (Download of Code Without Integrity Check)
Mitigation: add
Disabling build isolation (
No provenance / SBOM update visible This change adds a package outside the lockfile resolution mechanism. Confirm whether the SBOM or lockfile for this image is updated elsewhere to track this addition — absent that, the image's software inventory is incomplete (relevant to NIST SSDF PW.4). 🚥 Pre-merge checks | ✅ 9 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (9 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@images/runtime/training/py312-rocm64-torch280/Dockerfile`:
- Around line 81-83: Move the flash-attn installation out of the final
root-owned stage and into a throwaway unprivileged builder stage, since the
current RUN in the Dockerfile executes third-party build hooks as USER 0. Build
the wheel in that stage using the existing flash-attn install flow, then copy
only the resulting wheel into the final image and install from it there; keep
the unique install step around the flash-attn==2.8.3 command and preserve the
fix-permissions behavior in the final stage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b011dcb7-a307-4c67-9504-12f9684078f8
⛔ Files ignored due to path filters (1)
images/runtime/training/py312-rocm64-torch280/Pipfile.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
images/runtime/training/py312-rocm64-torch280/Dockerfile
| # Install flash-attn outside lockfile resolution; it needs torch visible at build time. | ||
| RUN pip install --no-build-isolation --no-cache-dir --no-deps flash-attn==2.8.3 && \ | ||
| fix-permissions /opt/app-root -P |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '== Dockerfile context ==\n'
nl -ba images/runtime/training/py312-rocm64-torch280/Dockerfile | sed -n '1,140p'
printf '\n== README context around flash-attn ==\n'
nl -ba images/runtime/training/README.md | sed -n '40,110p'
printf '\n== Multi-stage indicators in this Dockerfile ==\n'
rg -n '^FROM |^USER |flash-attn|no-build-isolation|fix-permissions' images/runtime/training/py312-rocm64-torch280/DockerfileRepository: opendatahub-io/distributed-workloads
Length of output: 239
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '== Dockerfile context ==\n'
awk 'NR>=1 && NR<=140 { printf "%4d %s\n", NR, $0 }' images/runtime/training/py312-rocm64-torch280/Dockerfile
printf '\n== README context around flash-attn ==\n'
awk 'NR>=40 && NR<=110 { printf "%4d %s\n", NR, $0 }' images/runtime/training/README.md
printf '\n== Multi-stage indicators in this Dockerfile ==\n'
rg -n '^FROM |^USER |flash-attn|no-build-isolation|fix-permissions' images/runtime/training/py312-rocm64-torch280/DockerfileRepository: opendatahub-io/distributed-workloads
Length of output: 7024
Build flash-attn in a non-root builder stage. pip install --no-build-isolation --no-deps flash-attn==2.8.3 runs third-party build hooks as USER 0, so a compromised sdist or index response becomes build-time root RCE and can taint the image (CWE-250, CWE-494). Move this install to a throwaway unprivileged stage and copy only the built wheel into the final image.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@images/runtime/training/py312-rocm64-torch280/Dockerfile` around lines 81 -
83, Move the flash-attn installation out of the final root-owned stage and into
a throwaway unprivileged builder stage, since the current RUN in the Dockerfile
executes third-party build hooks as USER 0. Build the wheel in that stage using
the existing flash-attn install flow, then copy only the resulting wheel into
the final image and install from it there; keep the unique install step around
the flash-attn==2.8.3 command and preserve the fix-permissions behavior in the
final stage.
Source: Path instructions
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sutaakar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@kapil27: The following test has Failed: OCI Artifact Browser URLInspecting Test Artifacts ManuallyTo inspect your test artifacts manually, follow these steps:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:odh-pr-test-distributed-workloads-kw9tr |
Flash-attn build hooks can fail during micropipenv lockfile installation with torch import errors in this image path. Remove flash-attn from lockfile resolution and install it explicitly with no-build-isolation after base dependencies are installed.
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit
New Features
Bug Fixes