Commit 26c59c5
Rebase/upstream sync (apache#8)
* [FFI] Structural equal and hash based on reflection (apache#18156)
This PR add initial support for structural equal and hash
via the new reflection mechanism.
It will helps us to streamline the structural equality/hash
with broader support and clean error reports via AccessPath.
It also gives us ability to unify all struct equal/hash
registration into the extra meta-data in reflection registration.
* [Fix][Relax] Fix potential out-of-bounds access in `TupleRewriterNode` (apache#18120)
* Root cause
* Update
* [BugFix] Fix NCCL build with GlobalDef registration (apache#18158)
This PR fixes a build failure in nccl.cc due to the recent switch of
global function registration.
* [FFI][REFACTOR] Introduce TypeAttr in reflection (apache#18160)
This PR introduces TypeAttr to reflection to bring extra
optional attribute registration that can be used to extend behaviors such
as structural equality. Also renames TypeExtraInfo to TypeMetadata for better clarity.
* [TIR] Fix trivial index map [] -> [0] (apache#18154)
fix trivial index map [] -> [0]
Co-authored-by: wrongtest <[email protected]>
* [Relax] Fix issue in fuse concat ops by pattern (apache#18163)
* [Relax] Fix issue in fuse concat ops by pattern
* fix lint
* [FFI][REFACTOR] Enable custom s_hash/equal (apache#18165)
This PR enables custom shash equal via TypeAttr,
also enhances the Var comparison by checking content
so we can precheck type signatures.
* [FFI][REFACTOR] Migrate StructuralEqual/Hash to new reflection (apache#18166)
This PR migrates the StructuralEqual/Hash to new reflection based approach.
The original mechanisms are still kept around and we will phase them out
in followup PRs.
The new mechanism unifies the structural equal/hash registration with
the normal reflection registeration and also brings cleaner implementation
for mismatch detection.
* [Web] Fix incompatible part after FFI updates (apache#18168)
fix: incompatible parts with the current core
* [REFACTOR][FFI] Phase out SEqualReduce/SHashReduce (apache#18172)
This PR phases out old SEqualReduce/SHashReduce mechanism
in favor of the new reflection mechanism via ffi/reflection.
It helps us to reduce the places we need to register the
reflection related information.
See the current IR examples for upgrading to the new mechanism.
* [FFI] Improve string equal/hash handling (apache#18176)
This PR improves the string equal hash handling
by improving some of the efficiencies.
* [FFI][REFACTOR] Isolate out extra API (apache#18177)
This PR formalizes the extra API in FFI. The extra APIs are minimal set of
APIs that are not required in core mechanism, but still helpful.
Move structural equal/hash to extra API.
* [Misc][BugFix] Fix missing PadAttrs register in op_attrs.py (apache#18174)
* [COMMUNITY] Add new key for release signing
* [Misc][BugFix] Fix missing PadAttrs register in op_attrs.py
Fix missing PadAttrs register in op_attrs.py
* [FFI] Fix SmallMapInit with duplicated keys (apache#18178)
This PR fixes Small map init when there are duplicated keys
* [Fix] Fix the wrong check for tuple node in apache#18163 (apache#18170)
* [Fix] Fix the wrong check for tuple node in apache#18163
* [REFACTOR] Upgrade NestedMsg<T> to use new ffi::Any mechanism (apache#18181)
This PR upgrades NestedMsg<T> to use the new ffi::Any mechanism,
which will enable us to get better support and enable NestedMsg
for POD types.
* [FFI][REFACTOR] Cleanup to align to latest ffi (apache#18183)
This PR modernizee legacy use to align with the latest FFI.
- Use Any to represent general Any instead of ObjectRef
- Use Optional<T>.has_value() instead of defined
* [FFI][REFACTOR] Hide StringObj/BytesObj into details (apache#18184)
This PR hides StringObj/BytesObj into details and bring
implementations to directly focus on the String/Bytes.
This change will prepare us for future changes such as SmallStr support.
Also moves more ObjectRef into Any in RPC.
* [FFI] Introduce small string/bytes (apache#18185)
* [Relax] Fix Relax Operator PReLU (apache#18179)
* [FFI] Lightweight json parser/writer (apache#18186)
This PR adds a lightweight json parser/writer to extra component.
* [FFI][EXTRA] Serialization To/From JSONGraph (apache#18187)
* [FFI][REFACTOR] Migrate the Save/Load JSON to the new reflection (apache#18188)
This PR migrates the Save/Load JSON to the new reflection based mechanism.
This is a breaking change that updates the the JSON format
to ffi/extra/serialization to handle the serialization,
see the json graph schema comment in ffi/extra/serialization.h
for the format, which roughly aligns with the old style.
After this change, we no longer need node/reflection and reflection vtable.
We can also phase out TVM_REGISTER_NODE and TVM_REGISTER_OBJECT to have a single
place that defines the reflection.
* [FFI][REFACTOR] Phase out getattr based attribute handling (apache#18189)
[REFACTOR] Phase out getattr based attribute handling
This PR phases out getattar based attribute handling as they are slower
and introduces extra code path.
This does mean that if an Object is not explicitly registered
in python side, we will no longer be able to access the field by name.
Likely this is also desirable as we would like to enable faster use that
updates the python end and do not rely on these behavior.
* [FFI][REFACTOR] Refactor AccessPath to enable full tree repr (apache#18191)
This PR refactors AccessPath so it can be used to represent full tree with compact memory.
Also fixes a bug in thec cython method export
* [FFI] Phase out ObjectPath in favor of AccessPath (apache#18192)
This PR phases out ObjectPath in favor of AccessPath
* Bump cutlass_fpA_intB_gemm to latest commit (apache#18193)
* [REFACTOR] Update data type rewriter to enable recursive rewrite in Any (apache#18197)
This PR updates the data type rewriter so recursive rewrite happens
in Any instead of ObjectRef.
* [Relax][ONNX] Parse ONNX Upsample to Relax resize2d (apache#18180)
* [TIR] Fix host/device function check for build (apache#18199)
This PR fixes a bug of deciding whether a function is host
or device function in TIR build.
Previously the decision is made based on checking whether `"cpu"`
is a substring of the target string. This check fails to work
for ROCm target, which usually comes with an `"mcpu"` attribute
that also contains `"cpu"`.
This PR fixes by checking target kind. Targets with kind `"llvm"`
or `"c"` will be treated as host functions.
* [FFI][REFACTOR] Move Downcast out of ffi for now (apache#18198)
Downcast was added for backward compact reasons and it have
duplicated features as Any.cast. This PR moves it out of ffi
to node for now so the ffi part contains minimal set of implementations.
* [FFI][REFACTOR] Update Map ABI to enable flexible smallMap switch (apache#18200)
This PR updates the Map ABI to use MSB in slots_ to indicate SmallMap.
The change would open doors for future changes to small map boundary
switch.
* [Fix] Codegen fix for relax cutlass (apache#18190)
* Codegen fix
---------
Co-authored-by: Tianqi Chen <[email protected]>
* [LLVM][CPPTEST] Small fixes for LLVM >= 20 (apache#18202)
This PR updates cpptest to run on LLVM >= 20
* [LLVM] Fixes up to the latest LLVM21 (apache#18204)
This PR fix TVM use with the latest LLVM version 21.
- At this time LLVM21 is available as a release candidate.
- Double checks for backward compatibility down to LLVM10
* [FFI][REFATOR] Cleanup entry function to redirect (apache#18205)
This PR updates the entry function mechanism to create a stub that redirects to the real function.
This new behavior helps to simplify the runtime logic supporting entry function.
Also updates the name to `__tvm_ffi_main__`
* [CODEGEN][REFACTOR] tir.call_llvm_intrin to remove nargs (apache#18206)
This PR refactors the tir.call_llvm_intrin to omit the first nargs argument in the beginning.
Previously the nargs was introduced when prefetch have different number of signature.
The previous reason no longer stands as of now, and it is less intuitive to attach nargs
for the call_llvm_intrin, where nargs directly appears in number of argument.
After the update, tir.call_llvm_intrin can directly pass in the arguments as it is.
* [FFI][Fix] Update datatype registry calls to the new paths (apache#18208)
* [TARGET] Add target for nvidia rtx 5060ti (apache#18211)
* [Bug] Fix core dump in InferLayoutRMSNorm and fix typo (apache#18210)
Fix core dump in InferLayoutRMSNorm and fix typo
* [FFI] Make JSON Parser/Write fastmath safe (apache#18212)
This PR adds fallbacks for nan and inf
detection/creation under fastmath mode.
* [ONNX][FRONTEND][Fix] Update Resize to accept ShapeExpr (apache#18209)
[ONNX][FRONTEND] Update Resize to accept ShapeExpr
* [FFI] Formalize ffi.Module (apache#18213)
This PR formalizes original runtime::Module into ffi
as ffi.Module and cleans the APIs around it.
The goal is to stablize the Module API as extra API that can benefit the overall
ffi interactions. We also refactors the c++ code that depends on the Module.
* [Fix][ONNX] No precision widening for numpy binary operations (apache#18207)
* [FFI][REFACTOR] Establish ffi.Module in python (apache#18214)
This PR refactors and establishes ffi.Module under the python tvm ffi api.
Also moves export_library method to executable so it aligns more with
compiled artifact.
* [FFI][REFACTOR] Establish Stream Context in ffi (apache#18216)
This PR sets up the stream context in ffi and migrate
the existing per device API stream context management
to ffi env API. The new API will help us to streamline
stream related integration formost libraries.
* [FFI] AudoDLPack compatible with torch stream context (apache#18217)
This PR updates the autodlpack path to automatically update
the env stream to be consistent with torch stream context.
The change would help to make FFI functions to be
compatible in stream based executions.
We leverage torch cpp_extension load_inline to create
an efficient query function, the first time loading
might take more time to build the jit module and
things should be fast after the torch jit module is cached.
* [FFI][REFACTOR] Cleanup API locations (apache#18218)
This PR cleans up the env api and move it to extra.
* [Fix] Resolve deadlock in PopenPoolExecutor and LocalBuilder (apache#18219)
- Add explicit shutdown flag in PopenPoolExecutor
- Replace del with explicit shutdown() calls in LocalBuilder
* [Thrust] Fix getting CUDA stream (apache#18220)
This PR updates the `GetCUDAStream` in CUDA thrust integration
to the latest `TVMFFIEnvGetCurrentStream` interface.
* [NVSHMEM] Fix compatibility with CUDA code without nvshmem use (apache#18222)
This PR fixes two bugs that cause normal TIR functions (ones that
don't use any NVSHMEM API) not being able to compile and run,
in cases where `set(USE_NVSHMEM xxx)` is enabled.
Co-authored-by: Bohan Hou <[email protected]>
* [FFI] Fix JSON parser/writer for the fast-math flag (apache#18221)
This PR fixes the JSON parser and writer for the support of the
fast-math flag.
Prior to this PR, there would be the following error when compiling
TVM with fast-math. This PR fixes this issue.
```
/home/ruihangl/Workspace/tvm/ffi/src/ffi/extra/json_writer.cc: In static member function ‘static bool tvm::ffi::json::JSONWriter::FastMathSafeIsNaN(double)’:
/home/ruihangl/Workspace/tvm/ffi/src/ffi/extra/json_writer.cc:69:22: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
69 | uint64_t bits = *reinterpret_cast<const uint64_t*>(&x);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/ruihangl/Workspace/tvm/ffi/src/ffi/extra/json_writer.cc: In static member function ‘static bool tvm::ffi::json::JSONWriter::FastMathSafeIsInf(double)’:
/home/ruihangl/Workspace/tvm/ffi/src/ffi/extra/json_writer.cc:84:22: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
84 | uint64_t bits = *reinterpret_cast<const uint64_t*>(&x);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
* [ROCm] Minor fixes for latest refactor (apache#18225)
This PR fixes a few ROCm and hipBLAS build issues after recent
refactors.
* [CI] Exit the build for AbortException (apache#18227)
[CI] Exit the build if met AbortException
* [FFI][REFACTOR] Establish tvm_ffi python module (apache#18226)
* [FFI][REFACTOR] Establish tvm_ffi as a standalone python module
This PR establishes tvm_ffi as a standalone python module.
The ffi is structured as a minimal pip module that can be
directly install by path or url.
examples/get_started provided a minimal example.
This is a major change as we are decoupling tvm_ffi as a
separate package, users need to install tvm_ffi separately.
Thanks to its minimal dependency, tvm_ffi can be easily installed
even just from the source by pip install ./ffi
This change would enable future improvement for library plugins
to have lightweight dependencies by just working on top of
the tvm_ffi, while the main compiler toolchain and runtime
can be layered on top.
* [FFI] Improve traceback setups
This PR improves traceback related setups
* [FFI] Robustify the pyproject setup (apache#18233)
This PR robustifies the pyproject setup to enable compact with cibuildwheel
* [LLVM][Fix] Do not emit debuginfo on vscale or other unknown types (apache#18232)
* [FFI] Misc fixup for windows (apache#18234)
This PR cleans up the ffi module to make it compatible for windows.
* [FFI][BUGFIX] Fix type_traits on DataType after SmallStr update (apache#18237)
This PR fixes the type_traits on DataType after SmallStr update.
We need to explicitly zero out the FFFIAny data structure to allow
fast comparison of FFIAny based on bytes values.
* [CUTLASS] Fix CUTLASS kernel compilation (apache#18238)
This PR fixes a few places in the current CUTLASS kernel AOT
compilation.
* [Relax] ONNX frontend using relax softplus operator (apache#18242)
Use relax softplus operator in onnx frontend
* [Relax] Add Python function support and BasePyModule for PyTorch integration (apache#18229)
### **Overview**
This PR implements native Python function support in TVM Relax through
the `@I.pyfunc` decorator and `BasePyModule`, which enable seamless
integration between TVM's compilation pipeline and Python/PyTorch runtime
environments. This enhancement allows users to write Python functions
directly in TVMScript that can interoperate with Relax and TIR functions
that provides enhanced debugging capabilities and leveraging existing
PyTorch operator libraries.
### **Key Features**
**TVMScript Parser Enhancement**
- `@I.pyfunc` decorator: Marks Python functions for integration into IRModules
- Dual storage format: Stores both raw string representation (for TVMScript
printing) and captured PackedFunc (for runtime execution)
- ExternFunc representation: Each Python function is represented as an
ExternFunc node with attributes storing source code and runtime wrapper
**Complete BasePyModule Implementation**
- DLPack-based tensor conversion: Seamless conversion between PyTorch
tensors and TVM NDArrays
- Cross-function interoperability: Python functions can call Relax/TIR
functions and vice versa
- JIT compilation: Delays compilation until module instantiation for flexible
late-stage modifications
- Dynamic function registration: Supports runtime addition of Python functions
### Future Work
- TVMScript printer for IRModules with Python functions: Print IRModules
in proper format with high-level operator mapping from Relax ops to PyTorch
ops, handling symbolic shapes
- R.call_py_func primitive: Introduce Relax primitive to invoke corresponding
PackedFunc of specified Python functions at runtime
* [Fix] Update FlashInfer JIT header lookup (apache#18244)
This PR fixes the tvm/dlpack/dmlc header lookup in the FlashInfer
kernel JIT compilation.
Prior to this fix, the JIT compilation assumes the environment
variable `TVM_SOURCE_DIR` is always defined, which is not always
true. This PR fixes the behavior and considers multiple cases,
including TVM source builds and pip-installed packages.
* [LLVM][MSWIN][CI] Fix LLVM module build with latest CI update (apache#18245)
* [FFI][CMAKE] Add missing download path for libbacktrace (apache#18246)
* [Build] Migrate Python packaging to pyproject.toml with scikit-build-core (apache#18239)
This pr migrates the TVM Python packaging system from the setup.py
flow to the modern, PEP 517/518 compliant pyproject.toml standard,
which allows us to produce a single, Python-version-agnostic wheel.
This change streamlines the process for both developers and users.
For local development, you can now set up a fully-functional editable
environment with a single command: `pip install -e .`.
To create the distributable package for release, simply run
`pip wheel -w dist .` , which will produce a universal wheel in the `dist/`
folder. This ensures that end-users can reliably install TVM with a standard
pip install tvm, regardless of their specific Python 3 version.
* [FFI][CMAKE] Revert cmake libbacktrace URL and update submodule (apache#18249)
* Revert the URL out from cmake for libbacktrace
* Switch git submodule to upstream HEAD instead
As per discussed here apache#18246 (comment),
this reverts in favour of git submodule way.
As per finding in the same discuss the upstream [already](https://github.com/ianlancetaylor/libbacktrace/blob/793921876c981ce49759114d7bb89bb89b2d3a2d/macho.c#L1273-L1275) incorporates [the one patch](ianlancetaylor/libbacktrace@master...tlc-pack:libbacktrace:master) used, and MacOS works fine.
* [Python] Update version.py to bump pyproject.toml automatically (apache#18248)
This PR updates the `version.py`, so every time when running this
file, it also bumps the version number in `pyproject.toml`
automatically.
* [Python] Complete Python packaging with scikit-build-core (apache#18251)
Following apache#18239, this PR fixes a few issues we ran into during
testing the packaging flow through scikit-build-core.
* upgrade cutlass v4.2.0 supporting cuda 13 (apache#18236)
* upgrade cutlass v4.2.0 supporting cuda 13
* upgrade cutlass v4.2.0 supporting cuda 13
* [FFI][ABI] ABI Updates to for future metadata and complex ordering (apache#18254)
This PR updates the ABI to enable potential future need
for getting metadata from a dynamically loaded module.
Orders the current static object into simple objects that have C
ABI and more complex one that may need c++.
These items changes ABI to be future compact before we freeze.
* [FFI][DOCS] Wheel Packaging (apache#18256)
[FFI] Wheel packaging example
This PR add an example about wheel packaging.
Also fixes various source packaging minor nits.
* [FFI] fix two seemingly migration issue (apache#18258)
* [FFI][ABI] Introduce weak rc support (apache#18259)
This PR adds weak ref counter support to the FFI ABI.
Weak rc is useful when we want to break cyclic dependencies.
- When a strong rc goes to zero, we call the destructor of the object, but not freeing the memory
- When both strong and weak rc goes to zero, we call the memory free operation
The weak rc mechanism is useful when we want to break cyclic dependencies in object, where the
weak rc can keep memory alive but the destructor is called.
As of now, because we deliberately avoid cyles in codebase, we do not have strong use-case for weak rc.
However, given weak rc is common practice in shared_ptr, Rust RC, and also used in torch's c10::intrusive_ptr.
It is better to make sure the ABI is future compatible to such use-cases before we freeze.
This PR implements weak rc as a u32 counter and strong rc as a u64 counter, with the following
design consideration.
- Weak rc is very rarely used and u32 is sufficient.
- Keeping weak rc in u32 allows us to keep object header size to 24 bytes,
saving extra 8 bytes(considering alignment)
We also need to update deleter to take flags that consider both weak and
strong deletion events. The implementation tries to optimize common case where
both strong and weak goes to 0 at the same time and call deleter once
with both flags set.
* [FFI][DOCS] Add missing files in packaging example (apache#18261)
This PR adds the missing files in packaging example
also renames get_started to quick_start
* [BugFix][NNAPI] Use kind() instead of type_key() after FFI refactor (apache#18262)
[BugFix][NNAPI] Use kind() after FFI refactor
This commit updates nnapi_runtime.cc to override kind() instead of type_key(),
aligning NNAPI with the new FFI interface. Behavior is consistent with
other runtimes that were updated in commit b8eb80b.
* [FFI][DOCS] Initial docs scaffolding (apache#18263)
* [DOCS] Misc docs fix (apache#18264)
This PR provides misc docs fix, updates the requirements of ffi docs
remove stale webpages from header, update embedding script to allow path.
* [Build] Complete TVM wheel building migration (apache#18252)
* finish1
* finish2
* finish3
* update
* update2
* update3
* update4
* update4
* update6
* Rename build step and update installation commandFix
* fix
* fix2
* fix3
* [Relax] Building TVMScript printer for IRModules with Python functions (apache#18253)
This PR implements TVMScript printer to format IRModules
containing `@I.pyfunc` decorated Python functions.
Example:
```
@I.ir_module
class MyModule(BasePyModule):
@I.pyfunc
def python_func(self, x, y):
x_tvm = self._convert_pytorch_to_tvm(x)
y_tvm = self._convert_pytorch_to_tvm(y)
result = self.call_tir(self.add_tir, [x_tvm, y_tvm],
out_sinfo=R.Tensor((5,), "float32"))
return self._convert_tvm_to_pytorch(result)
@T.prim_func
def add_tir(a: T.handle, b: T.handle, c: T.handle):
A = T.match_buffer(a, (5,), "float32")
B = T.match_buffer(b, (5,), "float32")
C = T.match_buffer(c, (5,), "float32")
for i in range(5):
C[i] = A[i] + B[i]
# Usage:
print(MyModule.script()) # Print formatted TVMScript
MyModule.show() # Display formatted output
```
* [FFI] Update torch stream getter to use native torch c api (apache#18266)
This PR updates the torch stream getter to use _cuda_getCurrentRawStream
in the torch C API that is also used by dynamo, saves us from load_inline
the custom module.
* [FFI] Support Opaque PyObject (apache#18270)
* [FFI] Support Opaque PyObject
This PR adds support of Opaque PyObject.
When a type in python is not natively supported by ffi, it will
now be converted to an Opaque PyObject on the backend, such opaque
object will retain their lifecycle automatically and can still be
used by registering python callbacks or store in container
and return to the frontend.
* Round of grammar polishment
* [FFI] Support inline module (apache#18271)
This PR adds initial support for load_inline in tvm_ffi
* [FFI] Construct NDArray.strides by default (apache#18272)
This PR updates NDArray.strides to construct strides by default
* [FFI][ABI] Append symbol prefix for ffi exported functions (apache#18273)
Previously we simply take the raw symbol for DSO libraries.
This can cause symbol conflict of functions that take the ffi calling convention
and those that are not.
This PR updates the convention to ask for LLVM and libary module to always
append a prefix __tvm_ffi_ to function symbols, this way we will no longer
have conflict in TVM_FFI_EXPORT_DLL_TYPED macro
* [FFI] Update the interface of `ffi.load_inline` to match torch (apache#18274)
This PR update the interface of ffi.load_inline to match torch.utils.cpp_extensions.load_inline:
- Rename cpp_source to cpp_sources, cuda_source to cuda_sources.
- Unify the cpp_functions and cuda_functions into functions.
- Add build_directory to allow the user to specify the build directory directly.
* [FFI][REFACTOR][ABI] Rename NDArray to Tensor (apache#18275)
This PR Updates the NDArray => Tensor.
Both tensor and ndarray are commonly used terms.
Because the term Tensor is getting more common in the context of ML,
we do the rename to stay more aligned with torch.Tensor and DLTensor.
* [FFI] Add ffi::Tensor.strides() (apache#18276)
* ffi::Tensor strides
* [FFI][REFACTOR] Cleanup tvm_ffi python API and types (apache#18277)
This PR cleans up the python API to make things more consistent
with existing python array api and torch.
Device update
- device_id => index, to be consistent with torch
- device_type => dlpack_device_type() returns int
- added type property same as torch.device
API updates:
- Move the convenient method like cpu() out into tvm runtime to keep device minimal
- tvm_ffi._init_api => tvm_ffi.init_ffi_api
- tvm_ffi.register_func => tvm_ffi.register_global_func
* [FFI] Temp skip load_inline tests nonlinux (apache#18278)
This PR temp skip load_inline tests on nonlinux before we
enhance and improve for other platforms.
* [LLVM][METASCHEDULE] Add RISCV V-extension v1.0 kernels to metaschedule (apache#18243)
- Enables high performance kernels covering majority of usual ML datatype inputs
- It is currently compliant with RVV specs version v1.0 (does not work with older v0.7.1)
- TIR kernels implemented here are using recently added VLA extension support
* [FFI][DOCS] Initial bringup of cpp docs (apache#18279)
This PR brings up initial version of cpp api docs.
* [FFI][Bugfix] Fix bug of `ffi.cpp.load_inline` on Windows (apache#18281)
This PR enables the load_inline on windows platform:
* [FFI][REFACTOR] Cleanup namespace (apache#18280)
* [FFI][REFACTOR] Cleanup namespace
This PR cleansup the namespace to ensure all ffi classes
are accessed through ffi:: namespace.
It will helps to cleanup the ffi package before isolation.
* fix hexagon
* [FFI] Relax default alignment and continguous requirement (apache#18282)
This PR relax default alignment and continguous requirement in dlpack import.
This allows the ffi to be useful in most settings.
We also provide utility for users to check these requirements themselves.
* [Fix][Metal] Fix type for device array in Metal API (apache#18283)
This PR fixes a typo in the previous ffi namespace cleanup.
* [Relax] Add Relax to Python Function Converter (apache#18269)
### Overview
This PR implements a Relax to Python Function Converter that transforms
Relax functions into executable Python functions using PyTorch operations.
This enables seamless conversion between TVM's Relax IR and Python/PyTorch
environments, which provides enhanced debugging capabilities and leveraging
existing PyTorch operator libraries for testing and deployment purposes.
### Key Feature
- **High-level operator mapping**: Maps 60+ Relax operators to
corresponding PyTorch APIs
- **Special operation handling**: Supports `call_tir`, `call_dps_packed`,
and Relax function calls with DLPack integration
- **Symbolic shape support**: Handles symbolic shapes and dynamic tensor
operations
### **Example**
```python
from tvm.relax.relax_to_pyfunc_converter import RelaxToPyFuncConverter
# Convert Relax functions to Python functions
converter = RelaxToPyFuncConverter(ir_module)
converted_ir_mod = converter.convert("my_function")
# Execute converted function with PyTorch tensors
result = converted_ir_mod.pyfuncs['my_function'](input_tensor)
```
* [FFI][REFACTOR] Introduce UnsafeInit and enhance ObjectRef null safety (apache#18284)
This PR enhances the nullptr and general type-safe of ObjectRef types.
Previously ObjectRef relies on constructor from ObjectPtr<Object> for casting
and initialize from nullptr.
We introduce a tag ffi::UnsafeInit, which explicitly states the intent
that the initialization is unsafe and may initialize non-nullable Ref to null.
Such tag should only be used in controlled scenarios.
Now the general RefType(ObjectPtr<Object>) is removed.
We still keep RefType(ObjectPtr<ContainerType>) for nullable objects,
but removes the default definition from non-nullable types, knowing that
user can always explicitly add it to class impl (ensuring null checking).
* [Hotfix] Fix the conflicts about ffi-related updated names (apache#18287)
* Change registration of mock softmax function
* Update check_asf_header.sh
Remove unnecessary blank line in check_asf_header.sh
* Update check_asf_header.sh
* fix
* [FFI][Bugfix] Enable `load_inline` on macos (apache#18285)
This PR fix the bug to enable `tvm_ffi.cpp.load_inline` on macos.
We need to link the `libtvm_ffi.dylib` to the custom module.
* [Metal] Fix MetalModuleCreate (apache#18290)
This PR fixes a type mismatch in MetalModuleCreate when initializing
a MetalModule. The error does not show up until the recent ObjectRef
null safety.
* [3rdparty] Bump cutlass_fpA_intB_gemm to fix SM90 build (apache#18291)
This PR fixes a SM90 build issue when CUTLASS is enabled.
The issue is because a source file indluced a CUTLASS header file
that has been removed since CUTLASS 4. Simply removing the header
fixes the build issue.
* [FFI][REFACTOR] Streamline Object Declare Macros (apache#18289)
* [Fix] Set DRefObj and CUDAIPCMemoryObj as mutable (apache#18294)
This PR marks `DRefObj` and `CUDAIPCMemoryObj` as a mutable
object classes. The flags are missed during previous macro refactor.
* [FFI][ABI] Introduce generic stream exchange protocol (apache#18295)
This PR adds a __tvm_ffi_env_stream__ protocol for generic
tensors to exchange env stream to tvm ffi.
Also renames TVMFFIEnvSetStream to TVMFFIEnvSetCurrentStream.
* [FFI] Temp skip windows tests (apache#18297)
* [Fix] Add libxml2 dependency to fix Windows CI build failure (apache#18296)
* [FFI] Fix system library symbol lookup (apache#18298)
* [Relax] Add symbolic shape support to BasePyModule for dynamic tensor operations (apache#18288)
This PR adds symbolic shape support to `BasePyModule`, which enables
dynamic tensor operations with runtime shape inference.
This allows users to use Relax's symbolic shape functionality in Python
function calls through BasePyModule, with dimensions automatically
resolved at execution time based on input tensor shapes.
## Usage Example
```python
import tvm
from tvm.script import ir as I, relax as R
from tvm.relax.base_py_module import BasePyModule
import numpy as np
@I.ir_module
class VectorAddModule(BasePyModule):
@R.function
def add(x: R.Tensor(("n",), "float32"),
y: R.Tensor(("n",), "float32")) -> R.Tensor(("n",), "float32"):
return R.add(x, y)
module = VectorAddModule(device=tvm.cpu(0), target="llvm")
a = np.array([1.0, 2.0, 3.0], dtype="float32")
b = np.array([4.0, 5.0, 6.0], dtype="float32")
result = module.add(a, b) # Result: [5.0, 7.0, 9.0]
```
* [CUDA] Support NVTX in CUDA 13 (apache#18300)
This PR adds the support of NVTX for CUDA 13. The change is because
that starting CUDA 13, the nvtx functions are moved to the lirbary
of `libnvtx3interop.so`, and the previous nvToolsExt library no
longer exists.
To ensure compatibility with both CUDA 12 and 13, we add
libnvtx3interop.so to the library lookup list.
* [Python] Fix runtime tensor import (apache#18299)
This PR fixes a few places where the python import of runtime tensor
is incorrect. The error wasn't revealed in the previous
NDArray->Tensor rename PR since these imports are not at the top
level.
* [FFI][REFACTOR] Refactor python ffi call mechanism for perf (apache#18302)
This PR refactors python ffi call mechanism. Previously the argument setting can
become an as things can be sensitive to the if checking order. This PR refactors
the calling to leverage a C++ based dispatcher where each dispatch functor can be
registered from Cython.
* Clear ext_lib_dll_names for macOS platform (apache#18304)
Removed external library DLL names for macOS.
found during tile-ai/tilelang#799
cc @LeiWang1999
* [Relax] Fix RelaxToPyFuncConverter compatibility and improve fallback handling (apache#18301)
This PR fixes multiple compatibility issues in `RelaxToPyFuncConverter`
caused by recent TVM API changes and improves the robustness of fallback
tensor handling.
* [FFI][ABI][REFACTOR] Enhance DLPack Exchange Speed and Behavior (apache#18306)
This PR enhances DLPack exchange by introducing DLPackPyObjectExporter,
DLPackPyObjectImporter and DLPackTensorAllocator.
These three function pointers will help us to speedup import/export
with DLPack and also streamline the rare(but still useful sometimes)
allocation inside the FFI.
They can help to significantly speedup autodlpack import. They will also
enable us to be able to query the allocator from env and return ffi::Tensor
back to the caller environment(experimental), when a function takes torch.Tensor
as argument, returned Tensor values will be converted to torch.Tensor.
Also renames SetCurrentStream => SetStream to align with styles in CUDA API.
Finally, we add option to select whether we release GIL,
we release gil by default like ctypes, however, for short running functions
it may be helpful to set func.release_gil = False
* [FFI] Update `load_inline` interface (apache#18307)
update load_inline interface
* [FFI][ABI] Refactor the naming of DLPack speed converter (apache#18308)
Update the name to avoid potential confusion
* [FFI][ABI] Better String and Nested Container handling (apache#18311)
[FFI][ABI][REFACTOR] Better String and nested container handling
This PR improves the overall String/Bytes and nested container handling
It also fixes a bug for temp object recycling when temp object.
- Introduce formal API for string/bytes creation
- Updates the tuple/dict conversion to also preserve the torch stream
- So if a function takes a list of torch.Tensor, torch stream will be setup in context
- Optimizes recursive argument conversion by moving most logic into c++
* [FFI][REFACTOR] Update TVM_FFI_STATIC_INIT_BLOCK to fn style (apache#18312)
This PR updates TVM_FFI_STATIC_INIT_BLOCK to function style.
Now we do the code as follows, which is cleaner in generally
and also helps error reporting to locate the right place.
```
TVM_FFI_STATIC_INIT_BLOCK() {
RegisterStaffs();
}
```
* [REFACTOR][FFI] Split tvm-ffi into a separate repo (apache#18314)
This PR updates the code so we split tvm-ffi into a separate repo
* [FlashInfer] Update include path and interface (apache#18317)
This PR updates the include path for FlashInfer JIT compilation,
and also updates the plan function interface for attention prefill
computation, to align with recent interface change in
flashinfer-ai/flashinfer#1661.
* [3rdparty] Remove dlpack/libbacktrace from 3rdparty (apache#18318)
[3rdparty] Remove dlpack/libbactrace from 3rdparty
This PR removes the TVM dependency on dlpack and libbacktrace,
as tvm-ffi being separated to https://github.com/apache/tvm-ffi.
* [TVMScript] Support continue and break in tvmscript (apache#17804)
* support continue and break in tvmscript
* fix black format
* fix pylint issue
* Update tests/python/tvmscript/test_tvmscript_syntax_sugar.py
Co-authored-by: Copilot <[email protected]>
* add printer/parser test, fix lint
* Fit to latest ffi update
* Skip i386 numpy-related test
* Introduce AnnotateIrregularLoop before any lowering loop expansions.
---------
Co-authored-by: Copilot <[email protected]>
* [Relax] Introduce R.call_py_func operator for calling Python functions from Relax IR (apache#18313)
This PR allows calling Python functions directly from Relax IR,
where integration between Relax computations and Python/PyTorch
operations can be supported.
### Usage Example
```python
@I.ir_module
class MyModule(BasePyModule):
@I.pyfunc
def pytorch_add(self, x, y):
return x + y
@R.function
def compute(x: R.Tensor((5,), "float32"), y: R.Tensor((5,), "float32")) -> R.Tensor((5,), "float32"):
result = R.call_py_func("pytorch_add", (x, y), out_sinfo=R.Tensor((5,), "float32"))
return result
```
* [TIR][CUDA] Preserve float precision in codegen with hexfloat output (apache#18320)
Previously, `float` constants in codegen were always emitted in **scientific decimal format**, e.g.:
```cpp
bfloat16_t(3.487723e-05f);
```
This could introduce slight **rounding differences** compared to the actual binary representation, since the constant is printed and then re-parsed in decimal. we now emit the value in **hexadecimal floating-point format** (`std::hexfloat`) to preserve the exact binary value, and additionally include the decimal form as a comment for readability:
```cpp
bfloat16_t(0x1.2492492492492p-15f /*3.487723e-05*/)
```
* [BugFix] Fixing binding for bert (apache#18324)
* Fixing binding for bert
* Fixing names
* [TIR] Add support for conditional expressions in TVMScript (apache#18323)
Add support for conditional expressions in TVMScript
This PR adds support for conditional expressions in TVMScript parser,
which allows developers to use Python-style conditional expressions
```python
@T.prim_func
def func(A: T.buffer((128, 128), "float32")):
for i, j in T.grid(128, 128):
A[i, j] = i if i < j else j
@T.prim_func
def expected(A: T.buffer((128, 128), "float32")):
for i, j in T.grid(128, 128):
A[i, j] = T.if_then_else(i < j, i, j)
```
* Fixing datatype error for gpt-2 (apache#18328)
* [CMake][Web] Install `web/` directory in cmake for Python package (apache#18327)
This PR updates the CMakeLists to install the web subdirectory
when building Python package, so that people do not need to clone
TVM source code to build web package.
* [Relax][Backend] Implement R.call_py_func operator for calling Python functions from compiled TVM (apache#18326)
This PR implements the `R.call_py_func` operator that allows compiled
TVM Relax modules to call Python functions at runtime. This enables
integration between TVM's compiled code and Python through a
robust VM backend implementation.
#### Simple Usage with BasePyModule
```python
@I.ir_module
class MyModule(BasePyModule):
@I.pyfunc
def torch_relu(self, x):
return torch.relu(x)
@R.function
def forward(x: R.Tensor((10,), "float32")) -> R.Tensor((10,), "float32"):
return R.call_py_func("torch_relu", (x,), out_sinfo=R.Tensor((10,), "float32"))
```
#### Direct VM Backend Usage (Manual)
```python
# Manually register Python function with VM backend
register_func = tvm.get_global_func("vm.builtin.register_py_func")
register_func("my_func", my_python_function)
# Use in Relax function (compiled to VM backend)
@R.function
def test(x: R.Tensor((5,), "float32")) -> R.Tensor((5,), "float32"):
return R.call_py_func("my_func", (x,), out_sinfo=R.Tensor((5,), "float32"))
# Manual cleanup (required for direct VM backend usage)
clear_func = tvm.get_global_func("vm.builtin.clear_py_func_registry")
clear_func()
```
* [flashinfer] Support directing JIT to FlashInfer GroupedGemm kernels (apache#18325)
in tvm/python/tvm/relax/backend/cuda/flashinfer.py added a
`gen_grouped_gemm_module`
in tvm/tests/python/relax/test_group_gemm_flashinfer.py added
tests for different combinations of
- input and output types: ("float8_e4m3fn", "float8_e4m3fn", "bfloat16"),
("float8_e4m3fn", "float8_e4m3fn", "float16"),
- scale granularity of m, n, k: (1, 128, 128),
- scale major mode: "MN", "K"
- mma_sm: 1, 2
- different batch sizes and m_sizes
* [Relax][Frontend][ONNX] Error converting operator Expand: TVMError: broadcast_to expects the input tensor shape is broadcastable to the target shape (apache#18329)
* [Relax] Operator and RoPE support for Llama4 (apache#18336)
Added LLama4 implementation, new rope implementation
* Fix conflict parameter name promote_dtye in FP8ComputeLegalize (apache#18334)
* [FFI][ABI] Bump version ffi to latest (apache#18332)
This PR bumps the version of tvm-ffi to latest, which involves an ABI change.
* [Relax][PyTorch] Support MatrixMultiply op for ExportedProgram importer (apache#18343)
This pr supports `mm.default` for ExportedProgram importer.
Resolves the issue apache#18339.
* [Relax] Update BasePyModule with faster DLPack converter for tensor conversion (apache#18331)
This PR enhances `BasePyModule` by integrating a faster DLPack
converter for efficient tensor conversion between TVM and PyTorch
following apache#18306.
* [TIR] Support sequence comparisons in TVMScript (apache#18341)
Implement proper parsing and evaluation of chained comparison operators
(e.g., `0 < i < 128`) in TVMScript. The sequence comparisons are now
correctly expanded to their logical equivalents (e.g., `(0 < i and i < 128)`).
Changes:
- Updated expression evaluator to handle sequence comparisons correctly
- Added test case to verify sequence comparison functionality
* [FFI][ABI] Bump tvm-ffi version to reflect RC ABI Update (apache#18345)
This PR bumps tvm-ffi version. The latest version contains a change
to the RC ABI that also needs web runtime update.
* [Python] Add library lookup path for tvm installed as a pakcage (apache#18348)
[Python] Add library lookup path when tvm installed as a pakcage
* [FFI][ABI] Bump tvm-ffi to latest (apache#18349)
This PR bumps tvm-ffi to latest. Which introduces ShapeView andminimizes TensorObj ABI.
* [Relax][Frontend][Torch] Fix parsing error when input dimension of unbind is 1 (apache#18351)
* [Relax][Frontend][Torch] Fix parsing error when input dimension of unbind is 1
* reformat code
* [Fix] Update ShapeView use in nccl.cc (apache#18352)
This PR fixes the use of ShapeView in nccl.cc, which was using
`Shape()->Product()`. This has been changed to `Shape().Product()`
with the introduction of ShapeView.
* [Relax][PyTorch] Support lstm op for ExportedProgram importer (apache#18346)
This pr supports `lstm.input` for ExportedProgram importer.
This links to issue apache#18340
* [Relax][ONNX] Support AllClassNMS Operator for ONNX Frontend (apache#18321)
Follow apache#18175 , this PR supports AllClassNMS Operator for ONNX Frontend
* [FFI][ABI] Bump tvm-ffi to latest (apache#18354)
This pr bumps the tvm-ffi module to latest
* [CUDA] Update FlashInfer JIT integration (apache#18353)
Following recent JIT refactor in FlashInfer that uses TVM FFI as
the JIT interface, this PR updates the JIT integration of FlashInfer
in TVM.
Major changes:
* we leverage FlashInfer's `JitSpec.build_and_load` to compile all
the JIT-generated source files, and remove the compilation logic
in TVM.
* for efficient tensor buffer management and efficient pointer
calculation, we enforced all `byte_offset` fields of auxiliary tensors
in KV cache to be zeros. The byte offset is now directly applied to
the data pointers.
* we also add a new parameter to FlashInfer JIT that controls whether
returning a linked shared library, or a list of compiled object paths.
For unit tests, returning a shared library is convenient and preferred,
while for cases such as MLC model compilation, object files are needed
to serialize the compiled model.
* rebase && update tvm-ffi
* fix(kvcache): restore GPU support lost during rebase
---------
Co-authored-by: Tianqi Chen <[email protected]>
Co-authored-by: Phoslight <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: wrongtest <[email protected]>
Co-authored-by: wrongtest <[email protected]>
Co-authored-by: chenxinli <[email protected]>
Co-authored-by: Park Woorak <[email protected]>
Co-authored-by: ysh329 <[email protected]>
Co-authored-by: Yong Wu <[email protected]>
Co-authored-by: Chenfan <[email protected]>
Co-authored-by: Balint Cristian <[email protected]>
Co-authored-by: zyl_keep_moving <[email protected]>
Co-authored-by: Wei Wang <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Marcel Dudek <[email protected]>
Co-authored-by: Shushi Hong <[email protected]>
Co-authored-by: Johnny <[email protected]>
Co-authored-by: Henry Hsieh <[email protected]>
Co-authored-by: Yaoyao Ding <[email protected]>
Co-authored-by: Yaxing Cai <[email protected]>
Co-authored-by: Yichen Yan <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Lei Wang <[email protected]>
Co-authored-by: Thais Camacho <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Anrui(Henry) Liu <[email protected]>
Co-authored-by: Neo Chien <[email protected]>
Co-authored-by: Pranav Venkatram <[email protected]>
Co-authored-by: Qingchao Shen <[email protected]>
Co-authored-by: Masahiro Hiramori <[email protected]>
Co-authored-by: Ruxiao Yin <[email protected]>
Co-authored-by: cwx <[email protected]>1 parent 0f73b00 commit 26c59c5
File tree
1,780 files changed
+38352
-54612
lines changed- .github
- actions/setup
- workflows
- 3rdparty
- apps
- android_rpc
- app/src/main/jni
- tests
- cpp_rpc
- hexagon_launcher
- ios_rpc
- tests
- tvmrpc
- ci/jenkins
- generated
- templates/utils
- cmake
- modules
- utils
- conda
- recipe
- docker
- docs
- arch
- deep_dive/tensor_ir/tutorials
- get_started/tutorials
- how_to/tutorials
- install
- reference/api/python
- relax
- runtime
- tir
- ffi
- 3rdparty
- cmake/Utils
- include/tvm/ffi
- container
- reflection
- scripts
- src/ffi
- tests/cpp
- include/tvm
- arith
- ir
- meta_schedule
- schedule/cuda
- node
- relax
- attrs
- distributed
- runtime
- contrib
- disco
- memory
- vm
- script
- ir_builder
- ir
- relax
- tir
- printer
- target
- te
- tir
- schedule
- topi
- contrib
- detail
- nn
- vision
- jvm
- core/src
- main/java/org/apache/tvm
- test/java/org/apache/tvm
- native
- linux-x86_64
- osx-x86_64
- src/main/native
- licenses
- python
- tvm
- arith
- contrib
- cutlass
- hexagon
- msc
- core
- codegen
- frontend
- ir
- runtime
- tools
- distill
- prune
- transform
- utils
- framework
- tensorflow
- codegen
- frontend
- runtime
- tensorrt
- codegen
- frontend
- runtime
- tools/quantize
- torch
- codegen
- frontend
- runtime
- tvm
- codegen
- runtime
- tools
- quantize
- track
- plugin
- codegen
- op
- dlight
- analysis
- benchmark
- driver
- exec
- ffi
- cython
- ir
- diagnostics
- meta_schedule
- builder
- cost_model
- database
- feature_extractor
- measure_callback
- mutator
- postproc
- runner
- schedule_rule
- schedule/cuda
- search_strategy
- space_generator
- task_scheduler
- testing
- relax
- analysis
- backend
- cuda
- metal
- distributed
- transform
- dpl
- frontend
- nn
- llm
- onnx
- torch
- op
- builtin
- ccl
- distributed
- grad
- image
- memory
- nn
- vision
- vm
- testing
- training
- transform
- legalize_ops
- rpc
- runtime
- disco
- profiling
- script
- ir_builder
- ir
- relax
- distributed
- tir
- parser
- core
- ir
- relax
- tir
- printer
- target
- testing
- te
- tir
- analysis
- schedule
- tensor_intrin
- transform
- topi
- cpp
- vision
- vision
- src
- arith
- contrib/msc
- core
- codegen
- ir
- printer
- transform
- framework
- tensorflow
- tensorrt
- torch
- tvm
- plugin
- ir
- meta_schedule
- builder
- cost_model
- database
- feature_extractor
- measure_callback
- mutator
- postproc
- runner
- schedule_rule
- schedule
- cpu
- cuda
- generic
- search_strategy
- space_generator
- task_scheduler
- node
- relax
- analysis
- backend
- contrib
- clml
- codegen_c
- codegen_json
- cublas
- cudnn
- cutlass
- dnnl
- hipblas
- nnapi
- tensorrt
- vm
- distributed
- transform
- ir
- op
- ccl
- distributed
- image
- memory
- nn
- tensor
- vision
- testing
- training
- transform
- runtime
- contrib
- amx
- arm_compute_lib
- bnns
- cblas
- clml
- coreml
- cublas
- cudnn
- cudnn_frontend
- curand
- cutlass
- dnnl
- edgetpu
- hipblas
- json
- miopen
- mps
- mrvl
- mscclpp
- msc
- nnapi
- nvshmem
- papi
- random
- rocblas
- sort
- tensorrt
- tflite
- thrust
- vllm
- cuda
- disco
- cuda_ipc
- distributed
- nccl
- hexagon
- rpc
- android
- hexagon
- simulator
- memory
- metal
- minrpc
- opencl
- rocm
- rpc
- vm
- cuda
- hexagon
- vulkan
- script
- ir_builder
- ir
- relax
- tir
- printer
- doc_printer
- ir
- relax
- tir
- support
- target
- datatype
- llvm
- opt
- parsers
- source
- spirv
- te
- operation
- tir
- analysis
- ir
- script
- op
- schedule
- analysis
- primitive
- transforms
- topi
- tests
- cpp-runtime
- hexagon
- opencl
- cpp
- support
- target
- parsers
- lint
- python
- all-platform-minimal-test
- codegen
- contrib
- test_hexagon
- test_msc
- disco
- driver
- ffi
- ir
- meta_schedule
- nightly/test_nnapi
- relax
- backend/clml
- nvshmem
- runtime
- target
- testing
- te
- tir-base
- tir-schedule
- tir-transform
- tvmscript
- scripts
- unity
- web
- apps/browser
- emcc
- src
- tests
- node
- python
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
1,780 files changed
+38352
-54612
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
| 47 | + | |
48 | 48 | | |
49 | | - | |
50 | | - | |
51 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
52 | 61 | | |
53 | 62 | | |
54 | 63 | | |
| |||
98 | 107 | | |
99 | 108 | | |
100 | 109 | | |
101 | | - | |
| 110 | + | |
102 | 111 | | |
103 | | - | |
104 | | - | |
105 | | - | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
106 | 124 | | |
107 | 125 | | |
108 | 126 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | 4 | | |
8 | 5 | | |
9 | 6 | | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | 7 | | |
14 | 8 | | |
15 | 9 | | |
| |||
28 | 22 | | |
29 | 23 | | |
30 | 24 | | |
31 | | - | |
32 | | - | |
33 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
Submodule cutlass_fpA_intB_gemm updated 9 files
- cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp-1
- cutlass_extensions/include/cutlass_extensions/gemm/kernel/fpA_intB_gemm.h+1-1
- cutlass_extensions/include/cutlass_extensions/gemm/kernel/fpA_intB_gemm_with_broadcast.h+1-1
- cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cutlass_kernel.h+1-1
- cutlass_kernels/cutlass_preprocessors.cc+1-1
- cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h+2-2
- cutlass_kernels/moe_gemm/moe_gemm_kernels_template.h+1-1
- tvm_binding/tvm_binding.cu+2-2
- weightOnlyBatchedGemv/enabled.h+1-1
This file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
79 | | - | |
80 | 79 | | |
81 | 80 | | |
82 | 81 | | |
| |||
122 | 121 | | |
123 | 122 | | |
124 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
125 | 127 | | |
126 | 128 | | |
127 | 129 | | |
128 | | - | |
129 | 130 | | |
130 | 131 | | |
131 | 132 | | |
| |||
501 | 502 | | |
502 | 503 | | |
503 | 504 | | |
504 | | - | |
| 505 | + | |
505 | 506 | | |
506 | 507 | | |
507 | 508 | | |
| |||
510 | 511 | | |
511 | 512 | | |
512 | 513 | | |
| 514 | + | |
513 | 515 | | |
514 | 516 | | |
515 | 517 | | |
| |||
519 | 521 | | |
520 | 522 | | |
521 | 523 | | |
522 | | - | |
523 | 524 | | |
524 | 525 | | |
525 | 526 | | |
| |||
528 | 529 | | |
529 | 530 | | |
530 | 531 | | |
| 532 | + | |
531 | 533 | | |
532 | 534 | | |
533 | 535 | | |
534 | 536 | | |
535 | 537 | | |
536 | 538 | | |
537 | 539 | | |
| 540 | + | |
538 | 541 | | |
539 | 542 | | |
540 | 543 | | |
| |||
564 | 567 | | |
565 | 568 | | |
566 | 569 | | |
567 | | - | |
568 | | - | |
| 570 | + | |
569 | 571 | | |
570 | 572 | | |
571 | 573 | | |
| |||
602 | 604 | | |
603 | 605 | | |
604 | 606 | | |
605 | | - | |
606 | | - | |
607 | | - | |
608 | | - | |
609 | 607 | | |
610 | 608 | | |
611 | 609 | | |
| |||
633 | 631 | | |
634 | 632 | | |
635 | 633 | | |
| 634 | + | |
636 | 635 | | |
637 | 636 | | |
638 | 637 | | |
| |||
643 | 642 | | |
644 | 643 | | |
645 | 644 | | |
646 | | - | |
647 | 645 | | |
648 | 646 | | |
649 | 647 | | |
| |||
687 | 685 | | |
688 | 686 | | |
689 | 687 | | |
690 | | - | |
691 | | - | |
692 | | - | |
693 | | - | |
694 | | - | |
695 | | - | |
696 | | - | |
697 | | - | |
698 | | - | |
699 | | - | |
700 | | - | |
701 | | - | |
702 | | - | |
703 | 688 | | |
704 | 689 | | |
705 | 690 | | |
| |||
713 | 698 | | |
714 | 699 | | |
715 | 700 | | |
716 | | - | |
717 | | - | |
718 | | - | |
719 | | - | |
720 | | - | |
721 | 701 | | |
722 | 702 | | |
723 | 703 | | |
| |||
779 | 759 | | |
780 | 760 | | |
781 | 761 | | |
782 | | - | |
783 | | - | |
| 762 | + | |
| 763 | + | |
784 | 764 | | |
785 | 765 | | |
786 | 766 | | |
| |||
789 | 769 | | |
790 | 770 | | |
791 | 771 | | |
792 | | - | |
| 772 | + | |
| 773 | + | |
793 | 774 | | |
794 | 775 | | |
795 | 776 | | |
| |||
839 | 820 | | |
840 | 821 | | |
841 | 822 | | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | 110 | | |
121 | 111 | | |
122 | 112 | | |
| |||
0 commit comments