-
Notifications
You must be signed in to change notification settings - Fork 82
CI: Add ROCm nightly docker workflow #3115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 41 commits
507fd86
59590af
ef2aefd
fc2c16d
2a6a9e6
a7223b2
c818f73
454a5fa
03393e9
551548d
e53f83c
2e68950
1784746
28cbb19
ffdf09a
865f60d
8711232
9b82418
1fbfc09
d74afa3
717a478
2cd758b
c28afaa
66ffb00
554e8f5
bb8a72d
a3fb579
cd7374c
3b901b6
f9c83ca
2c7f9b9
3525232
8fc34b8
1324872
eb32e63
cd940fe
6df0761
cd81668
7fd94cb
e74bf12
8c25b4c
758f32b
af18af1
807c7a1
391c1d3
fb1c009
20df855
077f47c
012f035
88ec330
7b8dd18
0fe733e
202d0c5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| ARG BASE_IMAGE=rocm/pytorch-autobuild:base-latest | ||
| FROM ${BASE_IMAGE} | ||
| WORKDIR /tmp | ||
| USER root | ||
|
|
||
| ENV CI=1 | ||
| ENV PYTORCH_TEST_WITH_ROCM=1 | ||
| ENV PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" | ||
| ENV USE_NVSHMEM=0 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @leo-automation Please add a comment stating that this is TODO and TEMPORARY and a reason why it's there |
||
|
|
||
| RUN git clone https://github.com/pytorch/pytorch --recursive \ | ||
| && cd pytorch \ | ||
| # Bypass sccache on torch_rocshmem: its -fgpu-rdc + mixed xnack± offload-arch flags break sccache's argv parser. | ||
| && sed -i 's|set_target_properties(torch_rocshmem PROPERTIES LINKER_LANGUAGE HIP)|set_target_properties(torch_rocshmem PROPERTIES LINKER_LANGUAGE HIP CXX_COMPILER_LAUNCHER "" HIP_COMPILER_LAUNCHER "")|' caffe2/CMakeLists.txt \ | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @leo-automation Do we still need this if we have the |
||
| && pip install -r requirements.txt \ | ||
| && git config --local user.name "AMD AMD" \ | ||
| && git config --local user.email "amd@amd.com" \ | ||
| && git remote add rocm https://github.com/ROCm/pytorch.git \ | ||
| && git fetch rocm \ | ||
| && git cherry-pick 519160d466782f5a62365be051fcb3ef90fa0b00 \ | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @leo-automation Do we need this as well? |
||
| && (.ci/pytorch/build.sh > /tmp/build.log 2>&1 || (tail -300 /tmp/build.log; exit 1)) \ | ||
| && rm -rf /tmp/pytorch/.git | ||
| RUN git clone https://github.com/pytorch/vision \ | ||
|
leo-automation marked this conversation as resolved.
|
||
| && cd vision \ | ||
| && FORCE_CUDA=1 python setup.py install \ | ||
| && rm -rf /tmp/vision/.git | ||
| RUN git clone https://github.com/pytorch/audio \ | ||
| && cd audio \ | ||
| && python setup.py install \ | ||
| && rm -rf /tmp/audio/.git | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,170 @@ | ||
| name: ROCm Nightly Build and Test | ||
|
|
||
| on: | ||
| schedule: | ||
| # Run nightly at 2 AM UTC | ||
| - cron: '0 2 * * *' | ||
| workflow_dispatch: | ||
| inputs: | ||
| rocm_version: | ||
| description: ROCm version to build | ||
| required: false | ||
| type: string | ||
| workflow_call: | ||
| inputs: | ||
| rocm_version: | ||
| required: false | ||
| type: string | ||
| push: | ||
| branches: | ||
| - rocm-nightly-gha | ||
|
|
||
| env: | ||
| ROCM_VERSION: '7.2.2' | ||
| PYTHON_VERSION: '3.10' | ||
| PYTORCH_ROCM_ARCH: 'gfx906;gfx908;gfx90a;gfx942;gfx950;gfx1030;gfx1100;gfx1101;gfx1102;gfx1150;gfx1151;gfx1200;gfx1201' | ||
|
jithunnair-amd marked this conversation as resolved.
|
||
| DOCKER_REGISTRY: rocm/pytorch-nightly | ||
|
|
||
| jobs: | ||
| build: | ||
| name: Build ROCm Nightly Image | ||
| runs-on: linux-pytorch-mi325-1 | ||
| timeout-minutes: 720 | ||
| outputs: | ||
| full-image: ${{ steps.meta.outputs.full-image }} | ||
| steps: | ||
| - name: Resolve ROCm version | ||
| if: ${{ inputs.rocm_version != '' }} | ||
| run: echo "ROCM_VERSION=${{ inputs.rocm_version }}" >> "$GITHUB_ENV" | ||
|
|
||
| - name: Checkout pytorch | ||
| uses: actions/checkout@v6 | ||
| with: | ||
| repository: pytorch/pytorch | ||
| ref: main | ||
|
|
||
| - name: Checkout nightly workflow files | ||
| uses: actions/checkout@v6 | ||
| with: | ||
| path: rocm-nightly-workflow | ||
|
|
||
| - name: Patch rocm-n build.sh version | ||
| run: | | ||
| sed -i '/pytorch-linux-jammy-rocm-n-py3 | pytorch-linux-jammy-rocm-n-py3-benchmarks | pytorch-linux-noble-rocm-n-py3)/,/;;/ s/ROCM_VERSION=7\.2/ROCM_VERSION=${{ env.ROCM_VERSION }}/' .ci/docker/build.sh | ||
| sed -n '/pytorch-linux-jammy-rocm-n-py3 | pytorch-linux-jammy-rocm-n-py3-benchmarks | pytorch-linux-noble-rocm-n-py3)/,/;;/p' .ci/docker/build.sh | ||
|
|
||
| - name: Generate image tag | ||
| id: meta | ||
| run: | | ||
| tag="$(date +%Y%m%d%H%M%S)-rocm${{ env.ROCM_VERSION }}" | ||
|
leo-automation marked this conversation as resolved.
Outdated
|
||
| echo "full-image=${{ env.DOCKER_REGISTRY }}:${tag}" >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Build base image | ||
| working-directory: .ci/docker | ||
| run: | | ||
| export SKIP_SCCACHE_INSTALL=1 | ||
|
leo-automation marked this conversation as resolved.
Outdated
|
||
| export PYTORCH_ROCM_ARCH="${{ env.PYTORCH_ROCM_ARCH }}" | ||
| ./build.sh pytorch-linux-jammy-rocm-n-py3 \ | ||
| -t rocm/pytorch-autobuild:base-latest | ||
|
|
||
| - name: Build ROCm Nightly Image | ||
| env: | ||
| FULL_IMAGE: ${{ steps.meta.outputs.full-image }} | ||
| run: | | ||
| docker build \ | ||
| --build-arg BASE_IMAGE=rocm/pytorch-autobuild:base-latest \ | ||
| -t "$FULL_IMAGE" \ | ||
| - < rocm-nightly-workflow/.ci/docker/pytorch-nightly-docker.Dockerfile | ||
|
|
||
| - name: Save nightly image artifact | ||
| env: | ||
| FULL_IMAGE: ${{ steps.meta.outputs.full-image }} | ||
| run: | | ||
| docker save -o nightly-image.tar "$FULL_IMAGE" | ||
|
|
||
| - name: Upload nightly image artifact | ||
| uses: actions/upload-artifact@v7 | ||
| with: | ||
| name: rocm-nightly-image | ||
| path: nightly-image.tar | ||
| retention-days: 1 | ||
| compression-level: 0 | ||
|
|
||
| test-push: | ||
| name: ${{ matrix.target.name }} | ||
| needs: build | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| target: | ||
| - name: Test and Push ROCm Nightly Image on MI325 | ||
| runner: linux-pytorch-mi325-1 | ||
| push_image: true | ||
| - name: Test ROCm Nightly Image on MI250 | ||
| runner: linux-pytorch-mi250-1 | ||
| push_image: false | ||
| runs-on: ${{ matrix.target.runner }} | ||
| timeout-minutes: 720 | ||
| env: | ||
| NIGHTLY_IMAGE: ${{ needs.build.outputs.full-image }} | ||
| steps: | ||
| - name: Resolve ROCm version | ||
| if: ${{ inputs.rocm_version != '' }} | ||
| run: echo "ROCM_VERSION=${{ inputs.rocm_version }}" >> "$GITHUB_ENV" | ||
|
|
||
| - name: Docker cleanup | ||
| run: | | ||
| docker container prune -f | ||
| docker image prune -f | ||
|
|
||
| - name: Download nightly image artifact | ||
| uses: actions/download-artifact@v8 | ||
| with: | ||
| name: rocm-nightly-image | ||
| path: nightly-image-artifact | ||
|
|
||
| - name: Load nightly image | ||
| run: docker load -i nightly-image-artifact/nightly-image.tar | ||
|
|
||
| - name: Run unit tests | ||
| run: | | ||
| docker run --rm \ | ||
| --device=/dev/kfd \ | ||
| --device=/dev/dri \ | ||
| --group-add video \ | ||
| --network host \ | ||
| --cap-add=SYS_PTRACE \ | ||
| --security-opt seccomp=unconfined \ | ||
| "$NIGHTLY_IMAGE" \ | ||
| bash -c " | ||
| git clone https://github.com/ROCm/pytorch-micro-benchmarking.git /tmp/pytorch-micro-benchmarking | ||
| cd /tmp/pytorch-micro-benchmarking | ||
| python3 micro_benchmarking_pytorch.py --network resnet50 | ||
| " | ||
|
|
||
| - name: Scan image for vulnerabilities | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ethanwee1 Can you please add this to our theRock docker image build workflow? cc @okakarpa There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| if: ${{ matrix.target.push_image }} | ||
| uses: aquasecurity/trivy-action@v0.36.0 | ||
| with: | ||
| image-ref: ${{ env.NIGHTLY_IMAGE }} | ||
| format: table | ||
| severity: CRITICAL | ||
| ignore-unfixed: true | ||
| exit-code: '1' | ||
|
|
||
| - name: Log in to Docker Hub | ||
| if: ${{ matrix.target.push_image }} | ||
| uses: docker/login-action@v4 | ||
| with: | ||
| username: ${{ secrets.DOCKER_USERNAME }} | ||
| password: ${{ secrets.DOCKER_PASSWORD }} | ||
|
leo-automation marked this conversation as resolved.
Outdated
leo-automation marked this conversation as resolved.
Outdated
|
||
|
|
||
| - name: Push validated image | ||
| if: ${{ matrix.target.push_image }} | ||
| env: | ||
| FINAL_IMAGE: ${{ needs.build.outputs.full-image }} | ||
| LATEST_IMAGE: ${{ env.DOCKER_REGISTRY }}:latest | ||
| run: | | ||
| docker tag "$FINAL_IMAGE" "$LATEST_IMAGE" | ||
| docker push "$FINAL_IMAGE" | ||
| docker push "$LATEST_IMAGE" | ||
Uh oh!
There was an error while loading. Please reload this page.