Skip to content

Commit c76570b

Browse files
committed
[CI] Add manylinux auditwheel repair
[executor] Introduce a pass to expand unsupproted Math operations Adds path `executor-expand-math-ops` to expand unsupported Math operations into compositions of supported operations. This pass utilizes upstream Math dialect rewrite pattern sets (e.g. `math-expand-ops`, `math-polynomial-approx` etc). GitOrigin-RevId: 5d1736f1a5a398ee37a003461ee2d32cd0a5c06e [mlir-tensorrt] Add Stablehlo patch for various Stablehlo upstream pass issues Add a Stablehlo patch for the following issues: 1. No Atan2 scalar float support in StablehloToScalarOp template 2. Fix crash when convert `stablehlo.bitcast_convert` with complex type to arith::BitcastOp 3. Fix StablehloRefineShapes can accidentally erase functions with side effects 4. Fix StablehloAggressiveFolder can crash when folding `stablehlo.compare` if the result type has static dimensions erased. [compiler][emitc] Add support for embedding and emitting runtime files This change enables the compiler to emit the required Standalone runtime sources and headers as artifacts when translating to EmitC, eliminating the need for users to manually locate and copy runtime files from the source tree. The StandaloneCPP runtime files (*.cpp, *.h) are now embedded directly into the compiler binary at build time using a new CMake script (GenerateEmbeddedStandaloneCPP.cmake) that generates a translation unit containing the file contents as raw string literals. A new pass `EmitCppSupportFilesPass` analyzes the EmitC module to determine which runtime components are required (Core, CUDA, TensorRT) and emits them as `executor.file_artifact` operations. The pass can also optionally generate an example CMakeLists.txt and a test driver source file. New compiler options control emission: - `--emitc-emit-support-files`: Emit all support files (runtime, CMake, driver) - `--emitc-emit-runtime-files`: Emit only the required runtime subset - `--emitc-emit-cmake-file`: Emit an example CMake file - `--emitc-emit-test-driver`: Emit a C++ test driver source file The `-o` output path option is now scoped to global CLI parsing only to avoid conflicts when parsing option strings programmatically. GitOrigin-RevId: 330f1f17e78584131b1b4a482e7600b39d3dfb27 [compiler] NFC: Add missing `memref-to-cuda` test cases [integrations/PJRT] Fix CMake configuration for PJRT library - Previous change to PJRT CMake config broke the build when BUILD_SHARED_LIBS is set to ON. - The fix is simply to undo the change to the location of the CMAKE_CXX_VISIBILITY_PRESET setting. - In addition, the visiblility fix using the linker option is improved using CMake `LINKER:` prefix. - Finally, the linker option fixed the issue that was causing us to build an additional PJRT static library for unit tests, so we can eliminate the extra library and just build the one monolithic shared library for testing and deployment. This additionally seems to act as a check against LLVM/MLIR symbol visibility. GitOrigin-RevId: 24c3090fca08602319466e2eefeb9f6fb0f68677 NFC: Consolidate CUDA integration tests and simplify test commands GitOrigin-RevId: 94c9f43ca8c51c6f02edcc8f26725268a84210a1 [mlir-tensorrt] Integrate internal changes --- [compiler] Add cuda.get_program_device op Introduce `cuda.get_program_device` as a pure/speculatable way to map a program logical device id (i32) to a CUDA device ordinal (i32). GitOrigin-RevId: 00512cc5a9e9c61023e1d9de734b2383da369bcf --- [compiler] Refactor device management and stream creation utilities This commit introduces a new device management model to support multi-device SPMD and MPMD programs and refactors stream creation to use reusable utility functions. The primary motivation is to enable more flexible device assignment where programs can be assigned to specific CUDA ordinals via logical device IDs, laying the groundwork for better multi-device support. GitOrigin-RevId: 447b72743e64f394671f866fcdfdb0d6f0f3d579 ---[compiler|executor] Refactor plugin call stream handling This change refactors how CUDA streams are handled for plugin calls in the executor dialect. Previously, when no stream was provided to a CallPluginOp, the lowering would create and use a global CUDA stream (stream0). This approach had several issues: 1. It tightly coupled the executor dialect to CUDA-specific stream creation 2. It required maintaining global stream state across compilation 3. It made the stream handling implicit and harder to reason about The new approach uses null streams (nullptr) when no explicit stream is provided. This is the standard CUDA convention where a null stream represents the default stream. The changes include: - Modified `executor.call_plugin` op to accept any type for the stream operand (not just `!executor.ptr<host>`), allowing frontend dialects to pass their own stream representations (e.g. `!cuda.stream`) - Updated the assembly format to print the stream type for clarity - Removed `getGlobalCudaStream` helper method from ConvertToExecutorPattern - Changed CallPluginConversionPattern to create a null pointer (inttoptr 0) when no stream is provided instead of creating a global stream - Updated StablehloToPlan conversion to use `cuda::getOrCreateDefaultStream0` to explicitly create CUDA streams when converting TVM FFI custom calls - Added CUDADialect dependency to StablehloToPlan pass and CMakeLists This makes stream handling more explicit and flexible, allowing different frontend dialects to manage their own stream creation while falling back to null streams (CUDA default stream) when appropriate. GitOrigin-RevId: 764238bc58308d5d284f8e32da91c7e5f90fdf0c
1 parent bce288b commit c76570b

File tree

131 files changed

+3595
-632
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

131 files changed

+3595
-632
lines changed

.github/workflows/mlir-tensorrt-build-test.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ on:
44
workflow_call:
55
inputs:
66
channel:
7-
description: 'Channel, valid values are "nightly", "test", or "release"'
7+
description: 'Channel, valid values are "nightly", "test", "release" or "pypi-release"'
88
default: "test"
99
type: string
1010
build-matrix:
@@ -26,7 +26,7 @@ jobs:
2626
env:
2727
# eg. TENSORRT_VERSION: 10.12 or 10.13
2828
MLIR_TRT_DOWNLOAD_TENSORRT_VERSION: ${{ matrix.trt }}
29-
# eg. CHANNEL: nightly, test or release
29+
# eg. CHANNEL: nightly, test, release or pypi-release
3030
CHANNEL: ${{ inputs.channel }}
3131
ARCH: ${{ matrix.arch }}
3232
CMAKE_PRESET: ${{ matrix.cmake_preset }}
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
name: MLIR-TensorRT PyPI Release CI
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
confirm_publish_to_pypi:
7+
description: 'I confirm that I have verified version to be published to pypi.org'
8+
required: true
9+
type: boolean
10+
default: false
11+
12+
defaults:
13+
run:
14+
shell: bash
15+
16+
jobs:
17+
generate-matrix:
18+
name: Generate Build Matrix (pypi-release)
19+
runs-on: ubuntu-latest
20+
outputs:
21+
matrix: ${{ steps.generate-matrix.outputs.matrix }}
22+
steps:
23+
- uses: actions/checkout@v6
24+
with:
25+
fetch-depth: 5
26+
- name: Generate Build Matrix
27+
id: generate-matrix
28+
run: |
29+
set -euo pipefail
30+
set -x
31+
MATRIX_BLOB="$(python3 ./.github/workflows/mlir-tensorrt/generate-matrix.py --channel pypi-release)"
32+
echo "${MATRIX_BLOB}"
33+
echo "matrix=${MATRIX_BLOB}" >> "${GITHUB_OUTPUT}"
34+
35+
pypi-release-wheels-build:
36+
name: ${{ matrix.arch}} - Build PyPI Release Wheels
37+
needs:
38+
- generate-matrix
39+
permissions:
40+
id-token: write
41+
packages: write
42+
contents: read
43+
strategy:
44+
matrix: ${{ fromJSON(needs.generate-matrix.outputs.matrix) }}
45+
runs-on: ${{ matrix.github_runner }}
46+
env:
47+
MLIR_TRT_DOWNLOAD_TENSORRT_VERSION: ${{ matrix.trt }}
48+
CMAKE_PRESET: ${{ matrix.cmake_preset }}
49+
CCACHE_RESTORE_KEY: mlir-tensorrt-ccache-v1-${{ matrix.arch }}-${{ matrix.cmake_preset }}
50+
CPM_RESTORE_KEY: mlir-tensorrt-cpm-v1
51+
timeout-minutes: 120
52+
container:
53+
# pypi audit wheel repair requires rockylinux8
54+
image: ${{ matrix.docker_image }}
55+
options: >-
56+
--gpus all
57+
--shm-size=1g
58+
steps:
59+
- name: Checkout TensorRT-Incubator
60+
uses: actions/checkout@v6
61+
with:
62+
fetch-depth: 5
63+
64+
- name: Create Cache Folders
65+
run: |
66+
set -euo pipefail
67+
set -x
68+
export CPM_SOURCE_CACHE=${GITHUB_WORKSPACE}/mlir-tensorrt/.cache.cpm
69+
export CCACHE_DIR=${GITHUB_WORKSPACE}/mlir-tensorrt/ccache
70+
71+
echo "CPM_SOURCE_CACHE=$CPM_SOURCE_CACHE" >> "$GITHUB_ENV"
72+
echo "CCACHE_DIR=$CCACHE_DIR" >> "$GITHUB_ENV"
73+
74+
mkdir -p ${CCACHE_DIR}
75+
mkdir -p ${CPM_SOURCE_CACHE}
76+
77+
- name: Compute CCache Key
78+
id: ccache-key
79+
run: |
80+
hash=$( (find mlir-tensorrt/compiler \
81+
mlir-tensorrt/common \
82+
mlir-tensorrt/kernel \
83+
mlir-tensorrt/tensorrt \
84+
mlir-tensorrt/integrations \
85+
mlir-tensorrt/executor \
86+
-type f \( -name '*.cpp' -o -name '*.h' \) \
87+
-exec sha256sum {} \; ; \
88+
sha256sum mlir-tensorrt/DependencyProvider.cmake \
89+
mlir-tensorrt/CMakeLists.txt) \
90+
| sort | sha256sum | cut -d' ' -f1)
91+
echo "key=${{ env.CCACHE_RESTORE_KEY }}-${hash}" >> $GITHUB_OUTPUT
92+
93+
- name: Compute CPM Key
94+
id: cpm-key
95+
run: |
96+
hash=$(sha256sum mlir-tensorrt/DependencyProvider.cmake | cut -d' ' -f1)
97+
echo "key=${{ env.CPM_RESTORE_KEY }}-${hash}" >> $GITHUB_OUTPUT
98+
99+
- name: Restore CCache
100+
id: restore-ccache
101+
uses: actions/cache/restore@v4
102+
with:
103+
key: ${{ steps.ccache-key.outputs.key }}
104+
restore-keys: |
105+
${{ env.CCACHE_RESTORE_KEY }}
106+
path: |
107+
${{ env.CCACHE_DIR }}
108+
109+
- name: Restore CPM cache
110+
id: restore-cpm
111+
uses: actions/cache/restore@v4
112+
with:
113+
key: ${{ steps.cpm-key.outputs.key }}
114+
enableCrossOsArchive: true
115+
restore-keys: |
116+
${{ env.CPM_RESTORE_KEY }}
117+
path: |
118+
mlir-tensorrt/.cache.cpm/*
119+
!mlir-tensorrt/.cache.cpm/tensorrt
120+
!mlir-tensorrt/.cache.cpm/tensorrt/**
121+
122+
- name: Build Wheels With CUDA:${{ matrix.cuda }} + TensorRT:${{ matrix.trt }}
123+
env:
124+
MLIR_TRT_DOWNLOAD_TENSORRT_VERSION: ${{ matrix.trt }}
125+
ARCH: ${{ matrix.arch }}
126+
CMAKE_PRESET: distribution-wheels
127+
run: |
128+
set -euo pipefail
129+
set -x
130+
cd mlir-tensorrt
131+
# Build only pjrt wheels for PyPI upload, with auditwheel repair
132+
MLIR_TRT_PYPI=1 PACKAGES="pjrt" ./build_tools/scripts/cicd-build-wheels.sh
133+
134+
- name: Upload Wheels
135+
uses: actions/upload-artifact@v4
136+
with:
137+
name: release-wheels-${{ matrix.arch }}-cu${{ matrix.cuda }}-trt${{ matrix.trt }}
138+
path: mlir-tensorrt/dist
139+
if-no-files-found: error
140+
141+
test-pypi-release-wheels-publish:
142+
name: Publish to TestPyPI
143+
needs: [pypi-release-wheels-build]
144+
runs-on: ubuntu-latest
145+
environment:
146+
name: testpypi
147+
url: https://test.pypi.org/project/mlir-tensorrt-jax/
148+
permissions:
149+
id-token: write
150+
packages: write
151+
contents: read
152+
steps:
153+
- name: Download built wheels
154+
uses: actions/download-artifact@v4
155+
with:
156+
pattern: release-wheels-*
157+
merge-multiple: true
158+
path: dist
159+
160+
- name: Publish to TestPyPI
161+
uses: pypa/gh-action-pypi-publish@release/v1
162+
with:
163+
skip-existing: true
164+
verbose: true
165+
repository-url: https://test.pypi.org/legacy/
166+
167+
pypi-release-wheels-publish:
168+
name: Publish to PyPI
169+
if: ${{ inputs.confirm_publish_to_pypi }}
170+
needs: [pypi-release-wheels-build]
171+
runs-on: ubuntu-latest
172+
permissions:
173+
id-token: write
174+
packages: write
175+
contents: read
176+
steps:
177+
- name: Download built wheels
178+
uses: actions/download-artifact@v4
179+
with:
180+
pattern: release-wheels-*
181+
merge-multiple: true
182+
path: dist
183+
184+
- name: Publish to PyPI
185+
uses: pypa/gh-action-pypi-publish@release/v1
186+
with:
187+
verbose: true
188+
skip-existing: true
189+
user: __token__
190+
password: ${{ secrets.PYPI_API_TOKEN }}
191+
192+
concurrency:
193+
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-mlir-tensorrt-pypi
194+
cancel-in-progress: true

.github/workflows/mlir-tensorrt-release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@ jobs:
2323
# eg. 10.12 or 10.13
2424
MLIR_TRT_DOWNLOAD_TENSORRT_VERSION: ${{ matrix.trt }}
2525
ARCH: ${{ matrix.arch }}
26-
CMAKE_PRESET: distribution-wheels
27-
CCACHE_RESTORE_KEY: mlir-tensorrt-ccache-v1-${{ matrix.arch }}-distribution-wheels
26+
CMAKE_PRESET: ${{ matrix.cmake_preset }}
27+
CCACHE_RESTORE_KEY: mlir-tensorrt-ccache-v1-${{ matrix.arch }}-${{ matrix.cmake_preset }}
2828
CPM_RESTORE_KEY: mlir-tensorrt-cpm-v1
2929
runs-on: ${{ matrix.github_runner }}
3030
timeout-minutes: 120

.github/workflows/mlir-tensorrt/generate-matrix.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,19 @@
3131
"trt": "10.13",
3232
},
3333
],
34+
"pypi-release": [
35+
{
36+
"cuda": "13.0",
37+
"trt": "10.13",
38+
},
39+
],
3440
}
3541

3642
ARCH_LIST_DICT = {
3743
"test": ["x86_64"],
3844
"release": ["x86_64", "aarch64"],
3945
"nightly": ["x86_64", "aarch64"],
46+
"pypi-release": ["x86_64", "aarch64"],
4047
}
4148

4249
GH_RUNNER_DICT = {
@@ -47,9 +54,8 @@
4754
CMAKE_PRESET_DICT = {
4855
"nightly": "github-cicd",
4956
"test": "github-cicd",
50-
# release should use the release wheel build preset
51-
# TODO: add the release wheel build preset
52-
"release": "github-cicd",
57+
"release": "distribution-wheels",
58+
"pypi-release": "distribution-wheels",
5359
}
5460

5561
DOCKER_IMAGE_DICT = {
@@ -71,6 +77,9 @@
7177
"13.0": "ghcr.io/nvidia/tensorrt-incubator/mlir-tensorrt:cuda13.0-rockylinux9-0.1",
7278
},
7379
},
80+
"pypi-release": {
81+
"13.0": "ghcr.io/nvidia/tensorrt-incubator/mlir-tensorrt:cuda13.0-rockylinux8-0.1",
82+
},
7483
}
7584

7685

@@ -84,9 +93,9 @@ def main(args: list[str]) -> None:
8493
)
8594

8695
options = parser.parse_args(args)
87-
if options.channel not in ("nightly", "test", "release"):
96+
if options.channel not in ("nightly", "test", "release", "pypi-release"):
8897
raise Exception(
89-
"--channel is invalid, please choose from nightly, test or release"
98+
"--channel is invalid, please choose from nightly, test, release or pypi-release"
9099
)
91100

92101
channel = options.channel
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
{
2+
"name": "cuda12.9-rockylinux8-prebuilt",
3+
"image": "ghcr.io/nvidia/tensorrt-incubator/mlir-tensorrt:cuda12.9-rockylinux8-0.1",
4+
"remoteUser": "nvidia",
5+
"updateRemoteUserUID": true,
6+
"runArgs": [
7+
"--name",
8+
"cuda12.9-rockylinux8-prebuilt-${localEnv:USER:nvidia}-${devcontainerId}",
9+
"--cap-add=SYS_PTRACE",
10+
"--security-opt",
11+
"seccomp=unconfined",
12+
"--shm-size=1g",
13+
"--ulimit",
14+
"memlock=-1",
15+
"--network=host"
16+
],
17+
"hostRequirements": {
18+
"gpu": "optional"
19+
},
20+
"workspaceMount": "source=${localWorkspaceFolder}/..,target=/workspaces/TensorRT-Incubator,type=bind,consistency=cached",
21+
"workspaceFolder": "/workspaces/TensorRT-Incubator/mlir-tensorrt",
22+
"customizations": {
23+
"vscode": {
24+
"extensions": [
25+
"llvm-vs-code-extensions.vscode-clangd",
26+
"llvm-vs-code-extensions.vscode-mlir",
27+
"eamodio.gitlens",
28+
"ms-python.black-formatter",
29+
"ms-python.python"
30+
],
31+
"settings": {
32+
"[python]": {
33+
"editor.defaultFormatter": "ms-python.black-formatter"
34+
},
35+
"mlir.pdll_compilation_databases": [
36+
"build/pdll_compile_commands.yml"
37+
],
38+
"mlir.server_path": "build/bin/mlir-tensorrt-lsp-server",
39+
"files.exclude": {
40+
"**/.git": true,
41+
"**/.cache": true,
42+
"**/.venv*": true
43+
},
44+
"files.watcherExclude": {
45+
"**/.git/objects/**": true,
46+
"**/.git/subtree-cache/**": true,
47+
"**/.private*": true,
48+
"**/.venv*/**": true,
49+
"**/build/**": true
50+
},
51+
"search.exclude": {
52+
"**/.private*": true,
53+
"**/.venv*": true,
54+
"**/build": true
55+
},
56+
"python.analysis.include": [
57+
"integrations/python",
58+
"integrations/python/internal"
59+
],
60+
"python.analysis.typeCheckingMode": "basic",
61+
"python.analysis.extraPaths": [
62+
"build/python_packages/mlir_tensorrt_compiler",
63+
"build/python_packages/mlir_tensorrt_runtime",
64+
"build/python_packages/tools"
65+
],
66+
"python.analysis.exclude": [
67+
"**/build/**",
68+
"**/.cache.cpm/**",
69+
"**/*bazel*/**",
70+
"**/build_tools/**",
71+
"third_party"
72+
]
73+
}
74+
}
75+
},
76+
"features": {
77+
"ghcr.io/devcontainers/features/common-utils:2": {
78+
"installZsh": true,
79+
"installOhMyZsh": true,
80+
"configureZshAsDefaultShell": false,
81+
"upgradePackages": false,
82+
"username": "nvidia",
83+
"userUid": "automatic",
84+
"userGid": "automatic"
85+
},
86+
"ghcr.io/devcontainers/features/git:1": {}
87+
}
88+
}

0 commit comments

Comments
 (0)