Add Windows wheel release job to nightly-wheels CI#64
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the nightly wheels CI workflow to build and publish Windows x64 wheels in parallel with the existing Linux wheel builds, and updates the README’s Windows instructions to match the intended Windows wheel availability and setup flow.
Changes:
- Adds a
build-wheels-windowsmatrix job (cp310/cp311/cp312) to.github/workflows/nightly-wheels.yml, including XRT Windows SDK download/extract and wheel publishing to the sharedlatest-wheelsrelease tag. - Updates README Windows requirements and setup steps (Python version range, Quick Start script path, and mlir-air install instructions).
- Updates Windows “Known Limitations” documentation to reflect Python version constraints for Xilinx Windows wheels.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| README.md | Updates Windows requirements and build/setup instructions to align with Windows wheel support. |
| .github/workflows/nightly-wheels.yml | Adds a Windows wheel build + release job alongside the existing Linux workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three Copilot review comments on PR #64: 1. env_setup.ps1 referenced non-existent hash files (mlir-aie-hash-windows.txt / llvm-aie-hash-windows.txt / mlir-air-hash-windows.txt). Rewrote to mirror env_setup.sh's pattern: use the existing mlir-aie-hash.txt and mlir-air-hash.txt, and pull llvm-aie latest nightly. Factored hash-field parsing into a Read-HashField helper. The script that the README's Quick Start instructs users to run now actually works. 2. env_setup.ps1 header said "Python 3.12 (required)" while the README now says "3.10, 3.11, or 3.12". Aligned the script header with the README. 3. Five Linux + three Windows matrix jobs all racing to update the same release tag risked partial uploads / overwritten artifacts. Removed the per-matrix-job Release wheels step and added a single publish-release job that: - needs: [build-wheels, build-wheels-windows] - downloads all triton-wheel-* artifacts via download-artifact with merge-multiple - calls ncipollo/release-action exactly once with the combined wheelhouse - tolerates partial build failures (publishes if at least one matrix succeeded) The release body is also unified into a single description with both Linux and Windows install snippets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14e7a42 to
0363c97
Compare
Adds a parallel build-wheels-windows job that builds and releases triton-xdna Windows x64 wheels alongside the existing Linux job. Wheels land on the same latest-wheels release tag. The Python matrix is capped at 3.10/3.11/3.12 because Xilinx publishes mlir-air Windows wheels only for those versions (the Linux matrix runs 3.10-3.14). The XRT Windows SDK is downloaded from the pinned 2.21.75 release and extracted to C:\Program Files\AMD\xrt where the existing Windows build infrastructure (utils/env_setup.ps1, setup.py) expects it. Also fixes three stale README items uncovered while writing the workflow: - Quick Start referenced .\utils\build_windows.ps1 which no longer exists; replaced with the env_setup.ps1 path that does - Manual Build claimed mlir-air must be built from source; replaced with the mlir_air[aie] pip install command - Python version requirement updated from "3.12+" to the accurate "3.10, 3.11, or 3.12" range Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The xrt_windows_sdk.zip top-level is xrt_sdk/xrt/ (not xrt/), so extracting directly to C:\Program Files\AMD\ produced C:\Program Files\AMD\xrt_sdk\xrt\... and the build couldn't find the headers at C:\Program Files\AMD\xrt\include\xrt\xrt_bo.h. Extract to RUNNER_TEMP and move the inner xrt_sdk/xrt/ folder to the expected destination. Also corrects the README's manual-install instructions which had the same wrong assumption. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three Copilot review comments on PR #64: 1. env_setup.ps1 referenced non-existent hash files (mlir-aie-hash-windows.txt / llvm-aie-hash-windows.txt / mlir-air-hash-windows.txt). Rewrote to mirror env_setup.sh's pattern: use the existing mlir-aie-hash.txt and mlir-air-hash.txt, and pull llvm-aie latest nightly. Factored hash-field parsing into a Read-HashField helper. The script that the README's Quick Start instructs users to run now actually works. 2. env_setup.ps1 header said "Python 3.12 (required)" while the README now says "3.10, 3.11, or 3.12". Aligned the script header with the README. 3. Five Linux + three Windows matrix jobs all racing to update the same release tag risked partial uploads / overwritten artifacts. Removed the per-matrix-job Release wheels step and added a single publish-release job that: - needs: [build-wheels, build-wheels-windows] - downloads all triton-wheel-* artifacts via download-artifact with merge-multiple - calls ncipollo/release-action exactly once with the combined wheelhouse - tolerates partial build failures (publishes if at least one matrix succeeded) The release body is also unified into a single description with both Linux and Windows install snippets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Windows can't replace the running pip.exe wrapper while it holds the file open. The previous "pip install --upgrade pip" line failed with: ERROR: To modify pip, please run the following command: ...python.exe -m pip install --upgrade pip Use "python -m pip install --upgrade pip" so the upgrade runs through the Python interpreter rather than the locked wrapper. Linux is unaffected and keeps its existing invocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cibuildwheel does not call vcvars64.bat on Windows; it relies on the host having MSVC on PATH already. github-hosted windows-latest images ship MSVC but leave the developer command prompt environment inactive, so cmake fails to find cl.exe / INCLUDE / LIB during configuration: CMake Error: Could not find compiler set in environment variable CXX: cl.exe. CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage Add ilammy/msvc-dev-cmd@v1 (the standard third-party action that wraps vcvars64.bat) before the cibuildwheel step. It exports the resulting PATH/INCLUDE/LIB to GITHUB_ENV so cibuildwheel's spawned subprocess inherits a working MSVC toolchain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
setup.py's download_llvm_for_triton_windows() unconditionally passed filter="data" to TarFile.extractall(). That kwarg was added in Python 3.12 (PEP 706) and backported to 3.10.12 / 3.11.4, but cibuildwheel's bundled nuget-cpython for cp310/cp311 isn't always a backport-bearing patch level. The Windows wheel build failed during LLVM extraction: TypeError: TarFile.extractall() got an unexpected keyword argument 'filter' Guard the kwarg behind sys.version_info >= (3, 12). Keeps the PEP 706 security filter on 3.12+, falls back to the plain call on older Pythons regardless of patch level. Linux is unaffected because this code path is Windows-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Windows runners (and local Windows clones), git checkout converts text files to CRLF by default, but our patches in third_party/ were generated with LF endings. The mismatch in context lines makes git apply --check report "patch failed: X: patch does not apply" even when the actual change is fine. apply_patches.py was bailing on first conflict — meaning when the Sanitizer/SanitizerAttributes/CMakeLists.txt hunk in triton_shared.patch failed CRLF matching, the PtrAnalysis.cpp hunk (which contains the size_t = ~0ULL fix that prevents an MSVC narrowing-conversion error) never got applied either. The Windows wheel build then died at compile time: PtrAnalysis.cpp(980): error C2397: conversion from 'int' to 'size_t' requires a narrowing conversion Add --ignore-whitespace to all three git apply invocations (check, reverse-check, real apply). git apply documents this flag as making context-line matching tolerant of whitespace differences; in practice CRLF vs LF falls under that tolerance. Linux and macOS behavior is unchanged because their checkouts already match the patch's LF endings. Note: gitattributes can't fix this because they don't cross submodule boundaries — the affected files live inside third_party/triton_shared/, which is its own git repo with its own attribute scope. A workflow-only fix (core.autocrlf=false) would help CI but not local Windows developers; this fix helps both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
windows-latest runners default to core.autocrlf=true, which rewrites text files to CRLF on checkout. third_party/triton_shared.patch's context lines are LF, so the rewrite makes git apply --check fail and apply_patches.py aborts. The build then dies at compile time: PtrAnalysis.cpp(980): error C2397: conversion from 'int' to 'size_t' requires a narrowing conversion — ironically, the exact error the patch was supposed to fix. The earlier --ignore-whitespace addition to apply_patches.py didn't help because git's whitespace-tolerance flags only ignore spaces and tabs in context-line matching, not CR characters. Set core.autocrlf=false and core.eol=lf as the very first step (before checkout) so the runner skips the conversion entirely. This is a CI-only fix; local Windows developers still need to configure their global git appropriately, but apply_patches.py's --ignore-whitespace remains as a partial mitigation for them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The triton-xdna wheel inherits METADATA from the upstream triton / triton-windows wheel, so before this change the published wheels identified themselves with upstream values: Author: Philippe Tillet, Dian Wu Author-email: phil@openai.com, woctordho@outlook.com Home-page: https://github.com/woct0rdho/triton-windows setup.py already rewrote Name and Version in the same loop; extend it to also rewrite Author, Author-email, and Home-page so the wheel self-identifies as the AMD project. Affects both the Linux and Windows wheel pipelines (same setup.py code path runs for both). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@erwei-xilinx fwiw I installed the released wheel an did not encounter any problems with msvc version (I only have the Visual Studio 2026 Build Tools installed now). I did encounter one unexpected failure, but when I reran that specific test it passed, so it's probably just a bit flaky and not much of a concern. |
…type Replace post-bufferize linalg_promote (which leaks self-copies that crash transform.air.copy_to_dma) with pre-bufferize bufferize_to_allocation + promote_tensor for L1 staging, mirroring mlir-air xrt 43_triton_layernorm. Eliminates "expected to produce 1 results (actually produced 0)" stderr on aie2p reported in #64.
|
Thanks for the test run, @astrelsky! Two real bugs surfaced from this: weighted_rms_norm (1/16384 elements) — Investigated and turned out to be a real systematic bug, not BF16 drift. Fixed by #66 — refactored the script to follow the mlir-air xrt 43_triton_layernorm prototype + a hybrid linalg_promote for the W operand. rms_norm stderr — Visible in your output but the test passed, so it wasn't on your bug list. Was a real latent bug though: linalg_promote emits self-copies that crash transform.air.copy_to_dma's 1-result contract. Fixed by #65 with the same refactor pattern. matmul_f32_padded_atransposed (109/109 zeros) — Expected on Windows for now; that's a known limitation while full-ELF support on Windows is still WIP. |
Summary
build-wheels-windowsjob to nightly-wheels.yml that builds and releases triton-xdna Windows x64 wheels alongside the existing Linux job. Both publish to the samelatest-wheelsrelease tag.mlir-airWindows wheels only for those three Pythons today.Xilinx/XRT 2.21.75release (xrt_windows_sdk.zip) and extracted toC:\Program Files\AMD\xrt, where the existing Windows build infrastructure (utils/env_setup.ps1,setup.py) expects it..\utils\build_windows.ps1which was removed during the windows-build-minimal PR cleanup; replaced with theenv_setup.ps1path that does exist.mlir_air[aie]pip-install command (matches whatenv_setup.ps1actually does).Caveats reviewers should know
windows-latestfrom a Linux dev box. The first nightly is likely to need at least one fix — common things to watch for: MSVC version mismatch withtriton-windows's expected toolchain,delvewheelrepair failing on missing DLLs, or paths insetup.py:_build_triton_windowsneeding tweaks under cibuildwheel's environment.repair-wheel-commandset under[tool.cibuildwheel.windows]in pyproject.toml. cibuildwheel runsdelvewheel repairby default; if our wheel bundles XRT-linked binaries we may need exclusions analogous to the Linuxauditwheelstep. Worth watching the first run's logs.setup_xrt_dev.ps1is intentionally not invoked — that script is for stripping headers + generating.libfrom a runtime-onlyxrt_coreutil.dll(the Ryzen AI SDK case). The fullxrt_windows_sdk.zipalready includes both, so it's not needed here.Test plan
latest-wheelsrelease alongside Linux wheels🤖 Generated with Claude Code