Skip to content

Conversation

@lanluo-nvidia
Copy link
Collaborator

No description provided.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

[executor] Introduce a pass to expand unsupproted Math operations

Adds path `executor-expand-math-ops` to expand unsupported Math
operations into compositions of supported operations. This pass
utilizes upstream Math dialect rewrite pattern sets (e.g.
`math-expand-ops`, `math-polynomial-approx` etc).

GitOrigin-RevId: 5d1736f1a5a398ee37a003461ee2d32cd0a5c06e

[mlir-tensorrt] Add Stablehlo patch for various Stablehlo upstream pass issues

Add a Stablehlo patch for the following issues:

1. No Atan2 scalar float support in StablehloToScalarOp template
2. Fix crash when convert `stablehlo.bitcast_convert` with complex
   type to arith::BitcastOp
3. Fix StablehloRefineShapes can accidentally erase functions with
   side effects
4. Fix StablehloAggressiveFolder can crash when folding
   `stablehlo.compare` if the result type has static dimensions
   erased.

[compiler][emitc] Add support for embedding and emitting runtime files

This change enables the compiler to emit the required Standalone runtime
sources and headers as artifacts when translating to EmitC, eliminating the
need for users to manually locate and copy runtime files from the source tree.

The StandaloneCPP runtime files (*.cpp, *.h) are now embedded directly into
the compiler binary at build time using a new CMake script
(GenerateEmbeddedStandaloneCPP.cmake) that generates a translation unit
containing the file contents as raw string literals.

A new pass `EmitCppSupportFilesPass` analyzes the EmitC module to determine
which runtime components are required (Core, CUDA, TensorRT) and emits them
as `executor.file_artifact` operations. The pass can also optionally generate
an example CMakeLists.txt and a test driver source file.

New compiler options control emission:
- `--emitc-emit-support-files`: Emit all support files (runtime, CMake, driver)
- `--emitc-emit-runtime-files`: Emit only the required runtime subset
- `--emitc-emit-cmake-file`: Emit an example CMake file
- `--emitc-emit-test-driver`: Emit a C++ test driver source file

The `-o` output path option is now scoped to global CLI parsing only
to avoid conflicts when parsing option strings programmatically.

GitOrigin-RevId: 330f1f17e78584131b1b4a482e7600b39d3dfb27

[compiler] NFC: Add missing `memref-to-cuda` test cases

[integrations/PJRT] Fix CMake configuration for PJRT library

- Previous change to PJRT CMake config broke the build
  when BUILD_SHARED_LIBS is set to ON.
- The fix is simply to undo the change to the location
  of the CMAKE_CXX_VISIBILITY_PRESET setting.
- In addition, the visiblility fix using the linker option
  is improved using CMake `LINKER:` prefix.
- Finally, the linker option fixed the issue that was causing
  us to build an additional PJRT static library for unit tests,
  so we can eliminate the extra library and just build the one
  monolithic shared library for testing and deployment. This
  additionally seems to act as a check against LLVM/MLIR
  symbol visibility.

GitOrigin-RevId: 24c3090fca08602319466e2eefeb9f6fb0f68677

NFC: Consolidate CUDA integration tests and simplify test commands

GitOrigin-RevId: 94c9f43ca8c51c6f02edcc8f26725268a84210a1

[mlir-tensorrt] Integrate internal changes

--- [compiler] Add cuda.get_program_device op

Introduce `cuda.get_program_device` as a pure/speculatable way to map
a program logical device id (i32) to a CUDA device ordinal (i32).

GitOrigin-RevId: 00512cc5a9e9c61023e1d9de734b2383da369bcf

--- [compiler] Refactor device management and stream creation utilities

This commit introduces a new device management model to support
multi-device SPMD and MPMD programs and refactors stream creation to
use reusable utility functions. The primary motivation is to enable
more flexible device assignment where programs can be assigned to
specific CUDA ordinals via logical device IDs, laying the groundwork
for better multi-device support.

GitOrigin-RevId: 447b72743e64f394671f866fcdfdb0d6f0f3d579

---[compiler|executor] Refactor plugin call stream handling

This change refactors how CUDA streams are handled for plugin calls in
the executor dialect. Previously, when no stream was provided to a
CallPluginOp, the lowering would create and use a global CUDA stream
(stream0). This approach had several issues:

1. It tightly coupled the executor dialect to CUDA-specific stream
   creation
2. It required maintaining global stream state across compilation
3. It made the stream handling implicit and harder to reason about

The new approach uses null streams (nullptr) when no explicit stream
is provided. This is the standard CUDA convention where a null stream
represents the default stream. The changes include:

- Modified `executor.call_plugin` op to accept any type for the stream
  operand (not just `!executor.ptr<host>`), allowing frontend dialects
  to pass their own stream representations (e.g. `!cuda.stream`)
- Updated the assembly format to print the stream type for clarity
- Removed `getGlobalCudaStream` helper method from
  ConvertToExecutorPattern
- Changed CallPluginConversionPattern to create a null pointer
  (inttoptr 0) when no stream is provided instead of creating a global
  stream
- Updated StablehloToPlan conversion to use
  `cuda::getOrCreateDefaultStream0` to explicitly create CUDA streams
  when converting TVM FFI custom calls
- Added CUDADialect dependency to StablehloToPlan pass and CMakeLists

This makes stream handling more explicit and flexible, allowing
different frontend dialects to manage their own stream creation while
falling back to null streams (CUDA default stream) when appropriate.

GitOrigin-RevId: 764238bc58308d5d284f8e32da91c7e5f90fdf0c
@lanluo-nvidia lanluo-nvidia force-pushed the lluo/build_wheel_for_pypi branch from d5534ca to c76570b Compare January 10, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants