Skip to content

[CI][XPU] enable unit test for XPU device#2814

Closed
DiweiSun wants to merge 25 commits intopytorch:mainfrom
DiweiSun:molly/enable_xpu_ci
Closed

[CI][XPU] enable unit test for XPU device#2814
DiweiSun wants to merge 25 commits intopytorch:mainfrom
DiweiSun:molly/enable_xpu_ci

Conversation

@DiweiSun
Copy link
Contributor

Enabling CI testing for the torchao project on the Intel XPU (GPU) platform to ensure functional correctness, performance consistency, and long-term compatibility as both torchao and XPU support evolve.

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2814

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 10 Cancelled Jobs

As of commit 030121f with merge base 2db4c76 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 20, 2025
Comment on lines +38 to +66
- name: Clean all stopped docker containers
if: always()
shell: bash
run: |
# Prune all stopped containers.
# If other runner is pruning on this node, will skip.
nprune=$(ps -ef | grep -c "docker container prune")
if [[ $nprune -eq 1 ]]; then
docker container prune -f
fi

- name: Runner health check GPU count
if: always()
shell: bash
run: |
ngpu=$(timeout 30 clinfo -l | grep -c -E 'Device' || true)
msg="Please file an issue on pytorch/ao reporting the faulty runner. Include a link to the runner logs so the runner can be identified"
if [[ $ngpu -eq 0 ]]; then
echo "Error: Failed to detect any GPUs on the runner"
echo "$msg"
exit 1
fi

- name: Use following to pull public copy of the image
id: print-ghcr-mirror
shell: bash
run: |
echo "docker pull ${DOCKER_IMAGE}"
docker pull ${DOCKER_IMAGE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ported done. Please kindly help review.

if-no-files-found: ignore
path: ./**/core.[1-9]*

- name: Teardown XPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can reuse the action in pytorch directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is literally ported from pytorch

DiweiSun and others added 2 commits August 22, 2025 14:30
Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>
Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>
@liangan1 liangan1 mentioned this pull request Sep 2, 2025
9 tasks
@liangan1 liangan1 added topic: for developers Use this tag if this PR is mainly developer facing ci labels Sep 4, 2025
DiweiSun and others added 2 commits September 4, 2025 14:43
Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>
@DiweiSun DiweiSun changed the title Molly/enable xpu ci [CI][XPU] enable unit test for XPU device Sep 8, 2025
- ciflow/xpu/*
pull_request:
branches:
- main
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove pull-request after review

@chuanqi129
Copy link
Contributor

@pytorchbot label "ciflow/xpu"

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 10, 2025

Didn't find following labels among repository labels: ciflow/xpu

@liangan1
Copy link
Collaborator

@pytorchbot label "ciflow/xpu"

@pytorch-bot pytorch-bot bot added the ciflow/xpu label used to trigger xpu CI jobs label Sep 15, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 15, 2025

Unknown label ciflow/xpu.
Currently recognized labels are

  • ciflow/benchmark
  • ciflow/tutorials
  • ciflow/rocm
  • ciflow/4xh100

@pytorch-bot pytorch-bot bot removed the ciflow/xpu label used to trigger xpu CI jobs label Sep 16, 2025
@liangan1
Copy link
Collaborator

@pytorchbot label "ciflow/xpu"

@pytorch-bot pytorch-bot bot added the ciflow/xpu label used to trigger xpu CI jobs label Sep 16, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2025

Warning: Unknown label ciflow/xpu.
Currently recognized labels are

  • ciflow/benchmark
  • ciflow/tutorials
  • ciflow/rocm
  • ciflow/4xh100

Please add the new label to .github/pytorch-probot.yml

@pytorch-bot pytorch-bot bot removed the ciflow/xpu label used to trigger xpu CI jobs label Sep 17, 2025
@liangan1 liangan1 added the ciflow/xpu label used to trigger xpu CI jobs label Sep 17, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 17, 2025

Unknown label ciflow/xpu.
Currently recognized labels are

  • ciflow/benchmark
  • ciflow/tutorials
  • ciflow/rocm
  • ciflow/4xh100

@DiweiSun
Copy link
Contributor Author

@pytorchbot label "ciflow/xpu"

1 similar comment
@liangan1
Copy link
Collaborator

@pytorchbot label "ciflow/xpu"

@liangan1 liangan1 removed the ciflow/xpu label used to trigger xpu CI jobs label Sep 17, 2025
@liangan1
Copy link
Collaborator

@pytorchbot label "ciflow/xpu"

@pytorch-bot pytorch-bot bot added the ciflow/xpu label used to trigger xpu CI jobs label Sep 17, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 17, 2025

Unknown label ciflow/xpu.
Currently recognized labels are

  • ciflow/benchmark
  • ciflow/tutorials
  • ciflow/rocm
  • ciflow/4xh100

@DiweiSun DiweiSun closed this Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci ciflow/xpu label used to trigger xpu CI jobs CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: for developers Use this tag if this PR is mainly developer facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants