[CI] Add manylinux auditwheel repair #778

lanluo-nvidia · 2026-01-06T06:03:01Z

No description provided.

copy-pr-bot · 2026-01-10T03:55:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

[executor] Introduce a pass to expand unsupproted Math operations Adds path `executor-expand-math-ops` to expand unsupported Math operations into compositions of supported operations. This pass utilizes upstream Math dialect rewrite pattern sets (e.g. `math-expand-ops`, `math-polynomial-approx` etc). GitOrigin-RevId: 5d1736f1a5a398ee37a003461ee2d32cd0a5c06e [mlir-tensorrt] Add Stablehlo patch for various Stablehlo upstream pass issues Add a Stablehlo patch for the following issues: 1. No Atan2 scalar float support in StablehloToScalarOp template 2. Fix crash when convert `stablehlo.bitcast_convert` with complex type to arith::BitcastOp 3. Fix StablehloRefineShapes can accidentally erase functions with side effects 4. Fix StablehloAggressiveFolder can crash when folding `stablehlo.compare` if the result type has static dimensions erased. [compiler][emitc] Add support for embedding and emitting runtime files This change enables the compiler to emit the required Standalone runtime sources and headers as artifacts when translating to EmitC, eliminating the need for users to manually locate and copy runtime files from the source tree. The StandaloneCPP runtime files (*.cpp, *.h) are now embedded directly into the compiler binary at build time using a new CMake script (GenerateEmbeddedStandaloneCPP.cmake) that generates a translation unit containing the file contents as raw string literals. A new pass `EmitCppSupportFilesPass` analyzes the EmitC module to determine which runtime components are required (Core, CUDA, TensorRT) and emits them as `executor.file_artifact` operations. The pass can also optionally generate an example CMakeLists.txt and a test driver source file. New compiler options control emission: - `--emitc-emit-support-files`: Emit all support files (runtime, CMake, driver) - `--emitc-emit-runtime-files`: Emit only the required runtime subset - `--emitc-emit-cmake-file`: Emit an example CMake file - `--emitc-emit-test-driver`: Emit a C++ test driver source file The `-o` output path option is now scoped to global CLI parsing only to avoid conflicts when parsing option strings programmatically. GitOrigin-RevId: 330f1f17e78584131b1b4a482e7600b39d3dfb27 [compiler] NFC: Add missing `memref-to-cuda` test cases [integrations/PJRT] Fix CMake configuration for PJRT library - Previous change to PJRT CMake config broke the build when BUILD_SHARED_LIBS is set to ON. - The fix is simply to undo the change to the location of the CMAKE_CXX_VISIBILITY_PRESET setting. - In addition, the visiblility fix using the linker option is improved using CMake `LINKER:` prefix. - Finally, the linker option fixed the issue that was causing us to build an additional PJRT static library for unit tests, so we can eliminate the extra library and just build the one monolithic shared library for testing and deployment. This additionally seems to act as a check against LLVM/MLIR symbol visibility. GitOrigin-RevId: 24c3090fca08602319466e2eefeb9f6fb0f68677 NFC: Consolidate CUDA integration tests and simplify test commands GitOrigin-RevId: 94c9f43ca8c51c6f02edcc8f26725268a84210a1 [mlir-tensorrt] Integrate internal changes --- [compiler] Add cuda.get_program_device op Introduce `cuda.get_program_device` as a pure/speculatable way to map a program logical device id (i32) to a CUDA device ordinal (i32). GitOrigin-RevId: 00512cc5a9e9c61023e1d9de734b2383da369bcf --- [compiler] Refactor device management and stream creation utilities This commit introduces a new device management model to support multi-device SPMD and MPMD programs and refactors stream creation to use reusable utility functions. The primary motivation is to enable more flexible device assignment where programs can be assigned to specific CUDA ordinals via logical device IDs, laying the groundwork for better multi-device support. GitOrigin-RevId: 447b72743e64f394671f866fcdfdb0d6f0f3d579 ---[compiler|executor] Refactor plugin call stream handling This change refactors how CUDA streams are handled for plugin calls in the executor dialect. Previously, when no stream was provided to a CallPluginOp, the lowering would create and use a global CUDA stream (stream0). This approach had several issues: 1. It tightly coupled the executor dialect to CUDA-specific stream creation 2. It required maintaining global stream state across compilation 3. It made the stream handling implicit and harder to reason about The new approach uses null streams (nullptr) when no explicit stream is provided. This is the standard CUDA convention where a null stream represents the default stream. The changes include: - Modified `executor.call_plugin` op to accept any type for the stream operand (not just `!executor.ptr<host>`), allowing frontend dialects to pass their own stream representations (e.g. `!cuda.stream`) - Updated the assembly format to print the stream type for clarity - Removed `getGlobalCudaStream` helper method from ConvertToExecutorPattern - Changed CallPluginConversionPattern to create a null pointer (inttoptr 0) when no stream is provided instead of creating a global stream - Updated StablehloToPlan conversion to use `cuda::getOrCreateDefaultStream0` to explicitly create CUDA streams when converting TVM FFI custom calls - Added CUDADialect dependency to StablehloToPlan pass and CMakeLists This makes stream handling more explicit and flexible, allowing different frontend dialects to manage their own stream creation while falling back to null streams (CUDA default stream) when appropriate. GitOrigin-RevId: 764238bc58308d5d284f8e32da91c7e5f90fdf0c

copy-pr-bot bot had a problem deploying to pypi January 7, 2026 06:44 Failure

copy-pr-bot bot had a problem deploying to testpypi January 7, 2026 06:44 Failure

copy-pr-bot bot had a problem deploying to testpypi January 7, 2026 11:02 Failure

lanluo-nvidia force-pushed the lluo/build_wheel_for_pypi branch 2 times, most recently from 9785d0a to af3c90f Compare January 7, 2026 21:14

copy-pr-bot bot had a problem deploying to testpypi January 7, 2026 22:19 Failure

copy-pr-bot bot temporarily deployed to testpypi January 8, 2026 03:32 Inactive

copy-pr-bot bot deployed to testpypi January 8, 2026 04:37 View deployment

lanluo-nvidia marked this pull request as ready for review January 8, 2026 06:15

lanluo-nvidia requested review from christopherbate, jhalakpatel, parthchadha, pranavm-nvidia, shelkesagar29 and yizhuoz004 as code owners January 8, 2026 06:15

lanluo-nvidia force-pushed the lluo/build_wheel_for_pypi branch from d5534ca to c76570b Compare January 10, 2026 04:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Add manylinux auditwheel repair #778

[CI] Add manylinux auditwheel repair #778

Uh oh!

lanluo-nvidia commented Jan 6, 2026

Uh oh!

copy-pr-bot bot commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CI] Add manylinux auditwheel repair #778

Are you sure you want to change the base?

[CI] Add manylinux auditwheel repair #778

Uh oh!

Conversation

lanluo-nvidia commented Jan 6, 2026

Uh oh!

copy-pr-bot bot commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants