-
Notifications
You must be signed in to change notification settings - Fork 22
[CI] Add manylinux auditwheel repair #778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lanluo-nvidia
wants to merge
1
commit into
main
Choose a base branch
from
lluo/build_wheel_for_pypi
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9785d0a to
af3c90f
Compare
[executor] Introduce a pass to expand unsupproted Math operations Adds path `executor-expand-math-ops` to expand unsupported Math operations into compositions of supported operations. This pass utilizes upstream Math dialect rewrite pattern sets (e.g. `math-expand-ops`, `math-polynomial-approx` etc). GitOrigin-RevId: 5d1736f1a5a398ee37a003461ee2d32cd0a5c06e [mlir-tensorrt] Add Stablehlo patch for various Stablehlo upstream pass issues Add a Stablehlo patch for the following issues: 1. No Atan2 scalar float support in StablehloToScalarOp template 2. Fix crash when convert `stablehlo.bitcast_convert` with complex type to arith::BitcastOp 3. Fix StablehloRefineShapes can accidentally erase functions with side effects 4. Fix StablehloAggressiveFolder can crash when folding `stablehlo.compare` if the result type has static dimensions erased. [compiler][emitc] Add support for embedding and emitting runtime files This change enables the compiler to emit the required Standalone runtime sources and headers as artifacts when translating to EmitC, eliminating the need for users to manually locate and copy runtime files from the source tree. The StandaloneCPP runtime files (*.cpp, *.h) are now embedded directly into the compiler binary at build time using a new CMake script (GenerateEmbeddedStandaloneCPP.cmake) that generates a translation unit containing the file contents as raw string literals. A new pass `EmitCppSupportFilesPass` analyzes the EmitC module to determine which runtime components are required (Core, CUDA, TensorRT) and emits them as `executor.file_artifact` operations. The pass can also optionally generate an example CMakeLists.txt and a test driver source file. New compiler options control emission: - `--emitc-emit-support-files`: Emit all support files (runtime, CMake, driver) - `--emitc-emit-runtime-files`: Emit only the required runtime subset - `--emitc-emit-cmake-file`: Emit an example CMake file - `--emitc-emit-test-driver`: Emit a C++ test driver source file The `-o` output path option is now scoped to global CLI parsing only to avoid conflicts when parsing option strings programmatically. GitOrigin-RevId: 330f1f17e78584131b1b4a482e7600b39d3dfb27 [compiler] NFC: Add missing `memref-to-cuda` test cases [integrations/PJRT] Fix CMake configuration for PJRT library - Previous change to PJRT CMake config broke the build when BUILD_SHARED_LIBS is set to ON. - The fix is simply to undo the change to the location of the CMAKE_CXX_VISIBILITY_PRESET setting. - In addition, the visiblility fix using the linker option is improved using CMake `LINKER:` prefix. - Finally, the linker option fixed the issue that was causing us to build an additional PJRT static library for unit tests, so we can eliminate the extra library and just build the one monolithic shared library for testing and deployment. This additionally seems to act as a check against LLVM/MLIR symbol visibility. GitOrigin-RevId: 24c3090fca08602319466e2eefeb9f6fb0f68677 NFC: Consolidate CUDA integration tests and simplify test commands GitOrigin-RevId: 94c9f43ca8c51c6f02edcc8f26725268a84210a1 [mlir-tensorrt] Integrate internal changes --- [compiler] Add cuda.get_program_device op Introduce `cuda.get_program_device` as a pure/speculatable way to map a program logical device id (i32) to a CUDA device ordinal (i32). GitOrigin-RevId: 00512cc5a9e9c61023e1d9de734b2383da369bcf --- [compiler] Refactor device management and stream creation utilities This commit introduces a new device management model to support multi-device SPMD and MPMD programs and refactors stream creation to use reusable utility functions. The primary motivation is to enable more flexible device assignment where programs can be assigned to specific CUDA ordinals via logical device IDs, laying the groundwork for better multi-device support. GitOrigin-RevId: 447b72743e64f394671f866fcdfdb0d6f0f3d579 ---[compiler|executor] Refactor plugin call stream handling This change refactors how CUDA streams are handled for plugin calls in the executor dialect. Previously, when no stream was provided to a CallPluginOp, the lowering would create and use a global CUDA stream (stream0). This approach had several issues: 1. It tightly coupled the executor dialect to CUDA-specific stream creation 2. It required maintaining global stream state across compilation 3. It made the stream handling implicit and harder to reason about The new approach uses null streams (nullptr) when no explicit stream is provided. This is the standard CUDA convention where a null stream represents the default stream. The changes include: - Modified `executor.call_plugin` op to accept any type for the stream operand (not just `!executor.ptr<host>`), allowing frontend dialects to pass their own stream representations (e.g. `!cuda.stream`) - Updated the assembly format to print the stream type for clarity - Removed `getGlobalCudaStream` helper method from ConvertToExecutorPattern - Changed CallPluginConversionPattern to create a null pointer (inttoptr 0) when no stream is provided instead of creating a global stream - Updated StablehloToPlan conversion to use `cuda::getOrCreateDefaultStream0` to explicitly create CUDA streams when converting TVM FFI custom calls - Added CUDADialect dependency to StablehloToPlan pass and CMakeLists This makes stream handling more explicit and flexible, allowing different frontend dialects to manage their own stream creation while falling back to null streams (CUDA default stream) when appropriate. GitOrigin-RevId: 764238bc58308d5d284f8e32da91c7e5f90fdf0c
d5534ca to
c76570b
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.