-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[FFI][REFACTOR] Establish tvm_ffi python module #18226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1f824de to
ffcd998
Compare
This PR establishes tvm_ffi as a standalone python module. The ffi is structured as a minimal pip module that can be directly install by path or url. examples/get_started provided a minimal example. This is a major change as we are decoupling tvm_ffi as a separate package, users need to install tvm_ffi separately. Thanks to its minimal dependency, tvm_ffi can be easily installed even just from the source by pip install ./ffi This change would enable future improvement for library plugins to have lightweight dependencies by just working on top of the tvm_ffi, while the main compiler toolchain and runtime can be layered on top.
This PR improves traceback related setups
|
In #17920 , we introduce the new TVM FFI system, after various evolutions and refactoring, we are getting to a state that the ffi component is ready to be isolated out. This marks a major milestone of our ffi refactoring. Summary of the Change
We will continue to evolve tvm-ffi a bit, but it should be at a reasonable stable state. The change would pave path for us to further modularize the stable part of components that can eable broad interactions, while continuously improving the compiler. |
* [FFI][REFACTOR] Establish tvm_ffi as a standalone python module This PR establishes tvm_ffi as a standalone python module. The ffi is structured as a minimal pip module that can be directly install by path or url. examples/get_started provided a minimal example. This is a major change as we are decoupling tvm_ffi as a separate package, users need to install tvm_ffi separately. Thanks to its minimal dependency, tvm_ffi can be easily installed even just from the source by pip install ./ffi This change would enable future improvement for library plugins to have lightweight dependencies by just working on top of the tvm_ffi, while the main compiler toolchain and runtime can be layered on top. * [FFI] Improve traceback setups This PR improves traceback related setups
* [FFI][REFACTOR] Establish tvm_ffi as a standalone python module This PR establishes tvm_ffi as a standalone python module. The ffi is structured as a minimal pip module that can be directly install by path or url. examples/get_started provided a minimal example. This is a major change as we are decoupling tvm_ffi as a separate package, users need to install tvm_ffi separately. Thanks to its minimal dependency, tvm_ffi can be easily installed even just from the source by pip install ./ffi This change would enable future improvement for library plugins to have lightweight dependencies by just working on top of the tvm_ffi, while the main compiler toolchain and runtime can be layered on top. * [FFI] Improve traceback setups This PR improves traceback related setups
* [FFI][REFACTOR] Establish tvm_ffi as a standalone python module This PR establishes tvm_ffi as a standalone python module. The ffi is structured as a minimal pip module that can be directly install by path or url. examples/get_started provided a minimal example. This is a major change as we are decoupling tvm_ffi as a separate package, users need to install tvm_ffi separately. Thanks to its minimal dependency, tvm_ffi can be easily installed even just from the source by pip install ./ffi This change would enable future improvement for library plugins to have lightweight dependencies by just working on top of the tvm_ffi, while the main compiler toolchain and runtime can be layered on top. * [FFI] Improve traceback setups This PR improves traceback related setups
* [FFI] Structural equal and hash based on reflection (apache#18156) This PR add initial support for structural equal and hash via the new reflection mechanism. It will helps us to streamline the structural equality/hash with broader support and clean error reports via AccessPath. It also gives us ability to unify all struct equal/hash registration into the extra meta-data in reflection registration. * [Fix][Relax] Fix potential out-of-bounds access in `TupleRewriterNode` (apache#18120) * Root cause * Update * [BugFix] Fix NCCL build with GlobalDef registration (apache#18158) This PR fixes a build failure in nccl.cc due to the recent switch of global function registration. * [FFI][REFACTOR] Introduce TypeAttr in reflection (apache#18160) This PR introduces TypeAttr to reflection to bring extra optional attribute registration that can be used to extend behaviors such as structural equality. Also renames TypeExtraInfo to TypeMetadata for better clarity. * [TIR] Fix trivial index map [] -> [0] (apache#18154) fix trivial index map [] -> [0] Co-authored-by: wrongtest <[email protected]> * [Relax] Fix issue in fuse concat ops by pattern (apache#18163) * [Relax] Fix issue in fuse concat ops by pattern * fix lint * [FFI][REFACTOR] Enable custom s_hash/equal (apache#18165) This PR enables custom shash equal via TypeAttr, also enhances the Var comparison by checking content so we can precheck type signatures. * [FFI][REFACTOR] Migrate StructuralEqual/Hash to new reflection (apache#18166) This PR migrates the StructuralEqual/Hash to new reflection based approach. The original mechanisms are still kept around and we will phase them out in followup PRs. The new mechanism unifies the structural equal/hash registration with the normal reflection registeration and also brings cleaner implementation for mismatch detection. * [Web] Fix incompatible part after FFI updates (apache#18168) fix: incompatible parts with the current core * [REFACTOR][FFI] Phase out SEqualReduce/SHashReduce (apache#18172) This PR phases out old SEqualReduce/SHashReduce mechanism in favor of the new reflection mechanism via ffi/reflection. It helps us to reduce the places we need to register the reflection related information. See the current IR examples for upgrading to the new mechanism. * [FFI] Improve string equal/hash handling (apache#18176) This PR improves the string equal hash handling by improving some of the efficiencies. * [FFI][REFACTOR] Isolate out extra API (apache#18177) This PR formalizes the extra API in FFI. The extra APIs are minimal set of APIs that are not required in core mechanism, but still helpful. Move structural equal/hash to extra API. * [Misc][BugFix] Fix missing PadAttrs register in op_attrs.py (apache#18174) * [COMMUNITY] Add new key for release signing * [Misc][BugFix] Fix missing PadAttrs register in op_attrs.py Fix missing PadAttrs register in op_attrs.py * [FFI] Fix SmallMapInit with duplicated keys (apache#18178) This PR fixes Small map init when there are duplicated keys * [Fix] Fix the wrong check for tuple node in apache#18163 (apache#18170) * [Fix] Fix the wrong check for tuple node in apache#18163 * [REFACTOR] Upgrade NestedMsg<T> to use new ffi::Any mechanism (apache#18181) This PR upgrades NestedMsg<T> to use the new ffi::Any mechanism, which will enable us to get better support and enable NestedMsg for POD types. * [FFI][REFACTOR] Cleanup to align to latest ffi (apache#18183) This PR modernizee legacy use to align with the latest FFI. - Use Any to represent general Any instead of ObjectRef - Use Optional<T>.has_value() instead of defined * [FFI][REFACTOR] Hide StringObj/BytesObj into details (apache#18184) This PR hides StringObj/BytesObj into details and bring implementations to directly focus on the String/Bytes. This change will prepare us for future changes such as SmallStr support. Also moves more ObjectRef into Any in RPC. * [FFI] Introduce small string/bytes (apache#18185) * [Relax] Fix Relax Operator PReLU (apache#18179) * [FFI] Lightweight json parser/writer (apache#18186) This PR adds a lightweight json parser/writer to extra component. * [FFI][EXTRA] Serialization To/From JSONGraph (apache#18187) * [FFI][REFACTOR] Migrate the Save/Load JSON to the new reflection (apache#18188) This PR migrates the Save/Load JSON to the new reflection based mechanism. This is a breaking change that updates the the JSON format to ffi/extra/serialization to handle the serialization, see the json graph schema comment in ffi/extra/serialization.h for the format, which roughly aligns with the old style. After this change, we no longer need node/reflection and reflection vtable. We can also phase out TVM_REGISTER_NODE and TVM_REGISTER_OBJECT to have a single place that defines the reflection. * [FFI][REFACTOR] Phase out getattr based attribute handling (apache#18189) [REFACTOR] Phase out getattr based attribute handling This PR phases out getattar based attribute handling as they are slower and introduces extra code path. This does mean that if an Object is not explicitly registered in python side, we will no longer be able to access the field by name. Likely this is also desirable as we would like to enable faster use that updates the python end and do not rely on these behavior. * [FFI][REFACTOR] Refactor AccessPath to enable full tree repr (apache#18191) This PR refactors AccessPath so it can be used to represent full tree with compact memory. Also fixes a bug in thec cython method export * [FFI] Phase out ObjectPath in favor of AccessPath (apache#18192) This PR phases out ObjectPath in favor of AccessPath * Bump cutlass_fpA_intB_gemm to latest commit (apache#18193) * [REFACTOR] Update data type rewriter to enable recursive rewrite in Any (apache#18197) This PR updates the data type rewriter so recursive rewrite happens in Any instead of ObjectRef. * [Relax][ONNX] Parse ONNX Upsample to Relax resize2d (apache#18180) * [TIR] Fix host/device function check for build (apache#18199) This PR fixes a bug of deciding whether a function is host or device function in TIR build. Previously the decision is made based on checking whether `"cpu"` is a substring of the target string. This check fails to work for ROCm target, which usually comes with an `"mcpu"` attribute that also contains `"cpu"`. This PR fixes by checking target kind. Targets with kind `"llvm"` or `"c"` will be treated as host functions. * [FFI][REFACTOR] Move Downcast out of ffi for now (apache#18198) Downcast was added for backward compact reasons and it have duplicated features as Any.cast. This PR moves it out of ffi to node for now so the ffi part contains minimal set of implementations. * [FFI][REFACTOR] Update Map ABI to enable flexible smallMap switch (apache#18200) This PR updates the Map ABI to use MSB in slots_ to indicate SmallMap. The change would open doors for future changes to small map boundary switch. * [Fix] Codegen fix for relax cutlass (apache#18190) * Codegen fix --------- Co-authored-by: Tianqi Chen <[email protected]> * [LLVM][CPPTEST] Small fixes for LLVM >= 20 (apache#18202) This PR updates cpptest to run on LLVM >= 20 * [LLVM] Fixes up to the latest LLVM21 (apache#18204) This PR fix TVM use with the latest LLVM version 21. - At this time LLVM21 is available as a release candidate. - Double checks for backward compatibility down to LLVM10 * [FFI][REFATOR] Cleanup entry function to redirect (apache#18205) This PR updates the entry function mechanism to create a stub that redirects to the real function. This new behavior helps to simplify the runtime logic supporting entry function. Also updates the name to `__tvm_ffi_main__` * [CODEGEN][REFACTOR] tir.call_llvm_intrin to remove nargs (apache#18206) This PR refactors the tir.call_llvm_intrin to omit the first nargs argument in the beginning. Previously the nargs was introduced when prefetch have different number of signature. The previous reason no longer stands as of now, and it is less intuitive to attach nargs for the call_llvm_intrin, where nargs directly appears in number of argument. After the update, tir.call_llvm_intrin can directly pass in the arguments as it is. * [FFI][Fix] Update datatype registry calls to the new paths (apache#18208) * [TARGET] Add target for nvidia rtx 5060ti (apache#18211) * [Bug] Fix core dump in InferLayoutRMSNorm and fix typo (apache#18210) Fix core dump in InferLayoutRMSNorm and fix typo * [FFI] Make JSON Parser/Write fastmath safe (apache#18212) This PR adds fallbacks for nan and inf detection/creation under fastmath mode. * [ONNX][FRONTEND][Fix] Update Resize to accept ShapeExpr (apache#18209) [ONNX][FRONTEND] Update Resize to accept ShapeExpr * [FFI] Formalize ffi.Module (apache#18213) This PR formalizes original runtime::Module into ffi as ffi.Module and cleans the APIs around it. The goal is to stablize the Module API as extra API that can benefit the overall ffi interactions. We also refactors the c++ code that depends on the Module. * [Fix][ONNX] No precision widening for numpy binary operations (apache#18207) * [FFI][REFACTOR] Establish ffi.Module in python (apache#18214) This PR refactors and establishes ffi.Module under the python tvm ffi api. Also moves export_library method to executable so it aligns more with compiled artifact. * [FFI][REFACTOR] Establish Stream Context in ffi (apache#18216) This PR sets up the stream context in ffi and migrate the existing per device API stream context management to ffi env API. The new API will help us to streamline stream related integration formost libraries. * [FFI] AudoDLPack compatible with torch stream context (apache#18217) This PR updates the autodlpack path to automatically update the env stream to be consistent with torch stream context. The change would help to make FFI functions to be compatible in stream based executions. We leverage torch cpp_extension load_inline to create an efficient query function, the first time loading might take more time to build the jit module and things should be fast after the torch jit module is cached. * [FFI][REFACTOR] Cleanup API locations (apache#18218) This PR cleans up the env api and move it to extra. * [Fix] Resolve deadlock in PopenPoolExecutor and LocalBuilder (apache#18219) - Add explicit shutdown flag in PopenPoolExecutor - Replace del with explicit shutdown() calls in LocalBuilder * [Thrust] Fix getting CUDA stream (apache#18220) This PR updates the `GetCUDAStream` in CUDA thrust integration to the latest `TVMFFIEnvGetCurrentStream` interface. * [NVSHMEM] Fix compatibility with CUDA code without nvshmem use (apache#18222) This PR fixes two bugs that cause normal TIR functions (ones that don't use any NVSHMEM API) not being able to compile and run, in cases where `set(USE_NVSHMEM xxx)` is enabled. Co-authored-by: Bohan Hou <[email protected]> * [FFI] Fix JSON parser/writer for the fast-math flag (apache#18221) This PR fixes the JSON parser and writer for the support of the fast-math flag. Prior to this PR, there would be the following error when compiling TVM with fast-math. This PR fixes this issue. ``` /home/ruihangl/Workspace/tvm/ffi/src/ffi/extra/json_writer.cc: In static member function ‘static bool tvm::ffi::json::JSONWriter::FastMathSafeIsNaN(double)’: /home/ruihangl/Workspace/tvm/ffi/src/ffi/extra/json_writer.cc:69:22: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] 69 | uint64_t bits = *reinterpret_cast<const uint64_t*>(&x); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/ruihangl/Workspace/tvm/ffi/src/ffi/extra/json_writer.cc: In static member function ‘static bool tvm::ffi::json::JSONWriter::FastMathSafeIsInf(double)’: /home/ruihangl/Workspace/tvm/ffi/src/ffi/extra/json_writer.cc:84:22: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] 84 | uint64_t bits = *reinterpret_cast<const uint64_t*>(&x); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` * [ROCm] Minor fixes for latest refactor (apache#18225) This PR fixes a few ROCm and hipBLAS build issues after recent refactors. * [CI] Exit the build for AbortException (apache#18227) [CI] Exit the build if met AbortException * [FFI][REFACTOR] Establish tvm_ffi python module (apache#18226) * [FFI][REFACTOR] Establish tvm_ffi as a standalone python module This PR establishes tvm_ffi as a standalone python module. The ffi is structured as a minimal pip module that can be directly install by path or url. examples/get_started provided a minimal example. This is a major change as we are decoupling tvm_ffi as a separate package, users need to install tvm_ffi separately. Thanks to its minimal dependency, tvm_ffi can be easily installed even just from the source by pip install ./ffi This change would enable future improvement for library plugins to have lightweight dependencies by just working on top of the tvm_ffi, while the main compiler toolchain and runtime can be layered on top. * [FFI] Improve traceback setups This PR improves traceback related setups * [FFI] Robustify the pyproject setup (apache#18233) This PR robustifies the pyproject setup to enable compact with cibuildwheel * [LLVM][Fix] Do not emit debuginfo on vscale or other unknown types (apache#18232) * [FFI] Misc fixup for windows (apache#18234) This PR cleans up the ffi module to make it compatible for windows. * [FFI][BUGFIX] Fix type_traits on DataType after SmallStr update (apache#18237) This PR fixes the type_traits on DataType after SmallStr update. We need to explicitly zero out the FFFIAny data structure to allow fast comparison of FFIAny based on bytes values. * [CUTLASS] Fix CUTLASS kernel compilation (apache#18238) This PR fixes a few places in the current CUTLASS kernel AOT compilation. * [Relax] ONNX frontend using relax softplus operator (apache#18242) Use relax softplus operator in onnx frontend * [Relax] Add Python function support and BasePyModule for PyTorch integration (apache#18229) ### **Overview** This PR implements native Python function support in TVM Relax through the `@I.pyfunc` decorator and `BasePyModule`, which enable seamless integration between TVM's compilation pipeline and Python/PyTorch runtime environments. This enhancement allows users to write Python functions directly in TVMScript that can interoperate with Relax and TIR functions that provides enhanced debugging capabilities and leveraging existing PyTorch operator libraries. ### **Key Features** **TVMScript Parser Enhancement** - `@I.pyfunc` decorator: Marks Python functions for integration into IRModules - Dual storage format: Stores both raw string representation (for TVMScript printing) and captured PackedFunc (for runtime execution) - ExternFunc representation: Each Python function is represented as an ExternFunc node with attributes storing source code and runtime wrapper **Complete BasePyModule Implementation** - DLPack-based tensor conversion: Seamless conversion between PyTorch tensors and TVM NDArrays - Cross-function interoperability: Python functions can call Relax/TIR functions and vice versa - JIT compilation: Delays compilation until module instantiation for flexible late-stage modifications - Dynamic function registration: Supports runtime addition of Python functions ### Future Work - TVMScript printer for IRModules with Python functions: Print IRModules in proper format with high-level operator mapping from Relax ops to PyTorch ops, handling symbolic shapes - R.call_py_func primitive: Introduce Relax primitive to invoke corresponding PackedFunc of specified Python functions at runtime * [Fix] Update FlashInfer JIT header lookup (apache#18244) This PR fixes the tvm/dlpack/dmlc header lookup in the FlashInfer kernel JIT compilation. Prior to this fix, the JIT compilation assumes the environment variable `TVM_SOURCE_DIR` is always defined, which is not always true. This PR fixes the behavior and considers multiple cases, including TVM source builds and pip-installed packages. * [LLVM][MSWIN][CI] Fix LLVM module build with latest CI update (apache#18245) * [FFI][CMAKE] Add missing download path for libbacktrace (apache#18246) * [Build] Migrate Python packaging to pyproject.toml with scikit-build-core (apache#18239) This pr migrates the TVM Python packaging system from the setup.py flow to the modern, PEP 517/518 compliant pyproject.toml standard, which allows us to produce a single, Python-version-agnostic wheel. This change streamlines the process for both developers and users. For local development, you can now set up a fully-functional editable environment with a single command: `pip install -e .`. To create the distributable package for release, simply run `pip wheel -w dist .` , which will produce a universal wheel in the `dist/` folder. This ensures that end-users can reliably install TVM with a standard pip install tvm, regardless of their specific Python 3 version. * [FFI][CMAKE] Revert cmake libbacktrace URL and update submodule (apache#18249) * Revert the URL out from cmake for libbacktrace * Switch git submodule to upstream HEAD instead As per discussed here apache#18246 (comment), this reverts in favour of git submodule way. As per finding in the same discuss the upstream [already](https://github.com/ianlancetaylor/libbacktrace/blob/793921876c981ce49759114d7bb89bb89b2d3a2d/macho.c#L1273-L1275) incorporates [the one patch](ianlancetaylor/libbacktrace@master...tlc-pack:libbacktrace:master) used, and MacOS works fine. * [Python] Update version.py to bump pyproject.toml automatically (apache#18248) This PR updates the `version.py`, so every time when running this file, it also bumps the version number in `pyproject.toml` automatically. * [Python] Complete Python packaging with scikit-build-core (apache#18251) Following apache#18239, this PR fixes a few issues we ran into during testing the packaging flow through scikit-build-core. * upgrade cutlass v4.2.0 supporting cuda 13 (apache#18236) * upgrade cutlass v4.2.0 supporting cuda 13 * upgrade cutlass v4.2.0 supporting cuda 13 * [FFI][ABI] ABI Updates to for future metadata and complex ordering (apache#18254) This PR updates the ABI to enable potential future need for getting metadata from a dynamically loaded module. Orders the current static object into simple objects that have C ABI and more complex one that may need c++. These items changes ABI to be future compact before we freeze. * [FFI][DOCS] Wheel Packaging (apache#18256) [FFI] Wheel packaging example This PR add an example about wheel packaging. Also fixes various source packaging minor nits. * [FFI] fix two seemingly migration issue (apache#18258) * [FFI][ABI] Introduce weak rc support (apache#18259) This PR adds weak ref counter support to the FFI ABI. Weak rc is useful when we want to break cyclic dependencies. - When a strong rc goes to zero, we call the destructor of the object, but not freeing the memory - When both strong and weak rc goes to zero, we call the memory free operation The weak rc mechanism is useful when we want to break cyclic dependencies in object, where the weak rc can keep memory alive but the destructor is called. As of now, because we deliberately avoid cyles in codebase, we do not have strong use-case for weak rc. However, given weak rc is common practice in shared_ptr, Rust RC, and also used in torch's c10::intrusive_ptr. It is better to make sure the ABI is future compatible to such use-cases before we freeze. This PR implements weak rc as a u32 counter and strong rc as a u64 counter, with the following design consideration. - Weak rc is very rarely used and u32 is sufficient. - Keeping weak rc in u32 allows us to keep object header size to 24 bytes, saving extra 8 bytes(considering alignment) We also need to update deleter to take flags that consider both weak and strong deletion events. The implementation tries to optimize common case where both strong and weak goes to 0 at the same time and call deleter once with both flags set. * [FFI][DOCS] Add missing files in packaging example (apache#18261) This PR adds the missing files in packaging example also renames get_started to quick_start * [BugFix][NNAPI] Use kind() instead of type_key() after FFI refactor (apache#18262) [BugFix][NNAPI] Use kind() after FFI refactor This commit updates nnapi_runtime.cc to override kind() instead of type_key(), aligning NNAPI with the new FFI interface. Behavior is consistent with other runtimes that were updated in commit b8eb80b. * [FFI][DOCS] Initial docs scaffolding (apache#18263) * [DOCS] Misc docs fix (apache#18264) This PR provides misc docs fix, updates the requirements of ffi docs remove stale webpages from header, update embedding script to allow path. * [Build] Complete TVM wheel building migration (apache#18252) * finish1 * finish2 * finish3 * update * update2 * update3 * update4 * update4 * update6 * Rename build step and update installation commandFix * fix * fix2 * fix3 * [Relax] Building TVMScript printer for IRModules with Python functions (apache#18253) This PR implements TVMScript printer to format IRModules containing `@I.pyfunc` decorated Python functions. Example: ``` @I.ir_module class MyModule(BasePyModule): @I.pyfunc def python_func(self, x, y): x_tvm = self._convert_pytorch_to_tvm(x) y_tvm = self._convert_pytorch_to_tvm(y) result = self.call_tir(self.add_tir, [x_tvm, y_tvm], out_sinfo=R.Tensor((5,), "float32")) return self._convert_tvm_to_pytorch(result) @T.prim_func def add_tir(a: T.handle, b: T.handle, c: T.handle): A = T.match_buffer(a, (5,), "float32") B = T.match_buffer(b, (5,), "float32") C = T.match_buffer(c, (5,), "float32") for i in range(5): C[i] = A[i] + B[i] # Usage: print(MyModule.script()) # Print formatted TVMScript MyModule.show() # Display formatted output ``` * [FFI] Update torch stream getter to use native torch c api (apache#18266) This PR updates the torch stream getter to use _cuda_getCurrentRawStream in the torch C API that is also used by dynamo, saves us from load_inline the custom module. * [FFI] Support Opaque PyObject (apache#18270) * [FFI] Support Opaque PyObject This PR adds support of Opaque PyObject. When a type in python is not natively supported by ffi, it will now be converted to an Opaque PyObject on the backend, such opaque object will retain their lifecycle automatically and can still be used by registering python callbacks or store in container and return to the frontend. * Round of grammar polishment * [FFI] Support inline module (apache#18271) This PR adds initial support for load_inline in tvm_ffi * [FFI] Construct NDArray.strides by default (apache#18272) This PR updates NDArray.strides to construct strides by default * [FFI][ABI] Append symbol prefix for ffi exported functions (apache#18273) Previously we simply take the raw symbol for DSO libraries. This can cause symbol conflict of functions that take the ffi calling convention and those that are not. This PR updates the convention to ask for LLVM and libary module to always append a prefix __tvm_ffi_ to function symbols, this way we will no longer have conflict in TVM_FFI_EXPORT_DLL_TYPED macro * [FFI] Update the interface of `ffi.load_inline` to match torch (apache#18274) This PR update the interface of ffi.load_inline to match torch.utils.cpp_extensions.load_inline: - Rename cpp_source to cpp_sources, cuda_source to cuda_sources. - Unify the cpp_functions and cuda_functions into functions. - Add build_directory to allow the user to specify the build directory directly. * [FFI][REFACTOR][ABI] Rename NDArray to Tensor (apache#18275) This PR Updates the NDArray => Tensor. Both tensor and ndarray are commonly used terms. Because the term Tensor is getting more common in the context of ML, we do the rename to stay more aligned with torch.Tensor and DLTensor. * [FFI] Add ffi::Tensor.strides() (apache#18276) * ffi::Tensor strides * [FFI][REFACTOR] Cleanup tvm_ffi python API and types (apache#18277) This PR cleans up the python API to make things more consistent with existing python array api and torch. Device update - device_id => index, to be consistent with torch - device_type => dlpack_device_type() returns int - added type property same as torch.device API updates: - Move the convenient method like cpu() out into tvm runtime to keep device minimal - tvm_ffi._init_api => tvm_ffi.init_ffi_api - tvm_ffi.register_func => tvm_ffi.register_global_func * [FFI] Temp skip load_inline tests nonlinux (apache#18278) This PR temp skip load_inline tests on nonlinux before we enhance and improve for other platforms. * [LLVM][METASCHEDULE] Add RISCV V-extension v1.0 kernels to metaschedule (apache#18243) - Enables high performance kernels covering majority of usual ML datatype inputs - It is currently compliant with RVV specs version v1.0 (does not work with older v0.7.1) - TIR kernels implemented here are using recently added VLA extension support * [FFI][DOCS] Initial bringup of cpp docs (apache#18279) This PR brings up initial version of cpp api docs. * [FFI][Bugfix] Fix bug of `ffi.cpp.load_inline` on Windows (apache#18281) This PR enables the load_inline on windows platform: * [FFI][REFACTOR] Cleanup namespace (apache#18280) * [FFI][REFACTOR] Cleanup namespace This PR cleansup the namespace to ensure all ffi classes are accessed through ffi:: namespace. It will helps to cleanup the ffi package before isolation. * fix hexagon * [FFI] Relax default alignment and continguous requirement (apache#18282) This PR relax default alignment and continguous requirement in dlpack import. This allows the ffi to be useful in most settings. We also provide utility for users to check these requirements themselves. * [Fix][Metal] Fix type for device array in Metal API (apache#18283) This PR fixes a typo in the previous ffi namespace cleanup. * [Relax] Add Relax to Python Function Converter (apache#18269) ### Overview This PR implements a Relax to Python Function Converter that transforms Relax functions into executable Python functions using PyTorch operations. This enables seamless conversion between TVM's Relax IR and Python/PyTorch environments, which provides enhanced debugging capabilities and leveraging existing PyTorch operator libraries for testing and deployment purposes. ### Key Feature - **High-level operator mapping**: Maps 60+ Relax operators to corresponding PyTorch APIs - **Special operation handling**: Supports `call_tir`, `call_dps_packed`, and Relax function calls with DLPack integration - **Symbolic shape support**: Handles symbolic shapes and dynamic tensor operations ### **Example** ```python from tvm.relax.relax_to_pyfunc_converter import RelaxToPyFuncConverter # Convert Relax functions to Python functions converter = RelaxToPyFuncConverter(ir_module) converted_ir_mod = converter.convert("my_function") # Execute converted function with PyTorch tensors result = converted_ir_mod.pyfuncs['my_function'](input_tensor) ``` * [FFI][REFACTOR] Introduce UnsafeInit and enhance ObjectRef null safety (apache#18284) This PR enhances the nullptr and general type-safe of ObjectRef types. Previously ObjectRef relies on constructor from ObjectPtr<Object> for casting and initialize from nullptr. We introduce a tag ffi::UnsafeInit, which explicitly states the intent that the initialization is unsafe and may initialize non-nullable Ref to null. Such tag should only be used in controlled scenarios. Now the general RefType(ObjectPtr<Object>) is removed. We still keep RefType(ObjectPtr<ContainerType>) for nullable objects, but removes the default definition from non-nullable types, knowing that user can always explicitly add it to class impl (ensuring null checking). * [Hotfix] Fix the conflicts about ffi-related updated names (apache#18287) * Change registration of mock softmax function * Update check_asf_header.sh Remove unnecessary blank line in check_asf_header.sh * Update check_asf_header.sh * fix * [FFI][Bugfix] Enable `load_inline` on macos (apache#18285) This PR fix the bug to enable `tvm_ffi.cpp.load_inline` on macos. We need to link the `libtvm_ffi.dylib` to the custom module. * [Metal] Fix MetalModuleCreate (apache#18290) This PR fixes a type mismatch in MetalModuleCreate when initializing a MetalModule. The error does not show up until the recent ObjectRef null safety. * [3rdparty] Bump cutlass_fpA_intB_gemm to fix SM90 build (apache#18291) This PR fixes a SM90 build issue when CUTLASS is enabled. The issue is because a source file indluced a CUTLASS header file that has been removed since CUTLASS 4. Simply removing the header fixes the build issue. * [FFI][REFACTOR] Streamline Object Declare Macros (apache#18289) * [Fix] Set DRefObj and CUDAIPCMemoryObj as mutable (apache#18294) This PR marks `DRefObj` and `CUDAIPCMemoryObj` as a mutable object classes. The flags are missed during previous macro refactor. * [FFI][ABI] Introduce generic stream exchange protocol (apache#18295) This PR adds a __tvm_ffi_env_stream__ protocol for generic tensors to exchange env stream to tvm ffi. Also renames TVMFFIEnvSetStream to TVMFFIEnvSetCurrentStream. * [FFI] Temp skip windows tests (apache#18297) * [Fix] Add libxml2 dependency to fix Windows CI build failure (apache#18296) * [FFI] Fix system library symbol lookup (apache#18298) * [Relax] Add symbolic shape support to BasePyModule for dynamic tensor operations (apache#18288) This PR adds symbolic shape support to `BasePyModule`, which enables dynamic tensor operations with runtime shape inference. This allows users to use Relax's symbolic shape functionality in Python function calls through BasePyModule, with dimensions automatically resolved at execution time based on input tensor shapes. ## Usage Example ```python import tvm from tvm.script import ir as I, relax as R from tvm.relax.base_py_module import BasePyModule import numpy as np @I.ir_module class VectorAddModule(BasePyModule): @R.function def add(x: R.Tensor(("n",), "float32"), y: R.Tensor(("n",), "float32")) -> R.Tensor(("n",), "float32"): return R.add(x, y) module = VectorAddModule(device=tvm.cpu(0), target="llvm") a = np.array([1.0, 2.0, 3.0], dtype="float32") b = np.array([4.0, 5.0, 6.0], dtype="float32") result = module.add(a, b) # Result: [5.0, 7.0, 9.0] ``` * [CUDA] Support NVTX in CUDA 13 (apache#18300) This PR adds the support of NVTX for CUDA 13. The change is because that starting CUDA 13, the nvtx functions are moved to the lirbary of `libnvtx3interop.so`, and the previous nvToolsExt library no longer exists. To ensure compatibility with both CUDA 12 and 13, we add libnvtx3interop.so to the library lookup list. * [Python] Fix runtime tensor import (apache#18299) This PR fixes a few places where the python import of runtime tensor is incorrect. The error wasn't revealed in the previous NDArray->Tensor rename PR since these imports are not at the top level. * [FFI][REFACTOR] Refactor python ffi call mechanism for perf (apache#18302) This PR refactors python ffi call mechanism. Previously the argument setting can become an as things can be sensitive to the if checking order. This PR refactors the calling to leverage a C++ based dispatcher where each dispatch functor can be registered from Cython. * Clear ext_lib_dll_names for macOS platform (apache#18304) Removed external library DLL names for macOS. found during tile-ai/tilelang#799 cc @LeiWang1999 * [Relax] Fix RelaxToPyFuncConverter compatibility and improve fallback handling (apache#18301) This PR fixes multiple compatibility issues in `RelaxToPyFuncConverter` caused by recent TVM API changes and improves the robustness of fallback tensor handling. * [FFI][ABI][REFACTOR] Enhance DLPack Exchange Speed and Behavior (apache#18306) This PR enhances DLPack exchange by introducing DLPackPyObjectExporter, DLPackPyObjectImporter and DLPackTensorAllocator. These three function pointers will help us to speedup import/export with DLPack and also streamline the rare(but still useful sometimes) allocation inside the FFI. They can help to significantly speedup autodlpack import. They will also enable us to be able to query the allocator from env and return ffi::Tensor back to the caller environment(experimental), when a function takes torch.Tensor as argument, returned Tensor values will be converted to torch.Tensor. Also renames SetCurrentStream => SetStream to align with styles in CUDA API. Finally, we add option to select whether we release GIL, we release gil by default like ctypes, however, for short running functions it may be helpful to set func.release_gil = False * [FFI] Update `load_inline` interface (apache#18307) update load_inline interface * [FFI][ABI] Refactor the naming of DLPack speed converter (apache#18308) Update the name to avoid potential confusion * [FFI][ABI] Better String and Nested Container handling (apache#18311) [FFI][ABI][REFACTOR] Better String and nested container handling This PR improves the overall String/Bytes and nested container handling It also fixes a bug for temp object recycling when temp object. - Introduce formal API for string/bytes creation - Updates the tuple/dict conversion to also preserve the torch stream - So if a function takes a list of torch.Tensor, torch stream will be setup in context - Optimizes recursive argument conversion by moving most logic into c++ * [FFI][REFACTOR] Update TVM_FFI_STATIC_INIT_BLOCK to fn style (apache#18312) This PR updates TVM_FFI_STATIC_INIT_BLOCK to function style. Now we do the code as follows, which is cleaner in generally and also helps error reporting to locate the right place. ``` TVM_FFI_STATIC_INIT_BLOCK() { RegisterStaffs(); } ``` * [REFACTOR][FFI] Split tvm-ffi into a separate repo (apache#18314) This PR updates the code so we split tvm-ffi into a separate repo * [FlashInfer] Update include path and interface (apache#18317) This PR updates the include path for FlashInfer JIT compilation, and also updates the plan function interface for attention prefill computation, to align with recent interface change in flashinfer-ai/flashinfer#1661. * [3rdparty] Remove dlpack/libbacktrace from 3rdparty (apache#18318) [3rdparty] Remove dlpack/libbactrace from 3rdparty This PR removes the TVM dependency on dlpack and libbacktrace, as tvm-ffi being separated to https://github.com/apache/tvm-ffi. * [TVMScript] Support continue and break in tvmscript (apache#17804) * support continue and break in tvmscript * fix black format * fix pylint issue * Update tests/python/tvmscript/test_tvmscript_syntax_sugar.py Co-authored-by: Copilot <[email protected]> * add printer/parser test, fix lint * Fit to latest ffi update * Skip i386 numpy-related test * Introduce AnnotateIrregularLoop before any lowering loop expansions. --------- Co-authored-by: Copilot <[email protected]> * [Relax] Introduce R.call_py_func operator for calling Python functions from Relax IR (apache#18313) This PR allows calling Python functions directly from Relax IR, where integration between Relax computations and Python/PyTorch operations can be supported. ### Usage Example ```python @I.ir_module class MyModule(BasePyModule): @I.pyfunc def pytorch_add(self, x, y): return x + y @R.function def compute(x: R.Tensor((5,), "float32"), y: R.Tensor((5,), "float32")) -> R.Tensor((5,), "float32"): result = R.call_py_func("pytorch_add", (x, y), out_sinfo=R.Tensor((5,), "float32")) return result ``` * [TIR][CUDA] Preserve float precision in codegen with hexfloat output (apache#18320) Previously, `float` constants in codegen were always emitted in **scientific decimal format**, e.g.: ```cpp bfloat16_t(3.487723e-05f); ``` This could introduce slight **rounding differences** compared to the actual binary representation, since the constant is printed and then re-parsed in decimal. we now emit the value in **hexadecimal floating-point format** (`std::hexfloat`) to preserve the exact binary value, and additionally include the decimal form as a comment for readability: ```cpp bfloat16_t(0x1.2492492492492p-15f /*3.487723e-05*/) ``` * [BugFix] Fixing binding for bert (apache#18324) * Fixing binding for bert * Fixing names * [TIR] Add support for conditional expressions in TVMScript (apache#18323) Add support for conditional expressions in TVMScript This PR adds support for conditional expressions in TVMScript parser, which allows developers to use Python-style conditional expressions ```python @T.prim_func def func(A: T.buffer((128, 128), "float32")): for i, j in T.grid(128, 128): A[i, j] = i if i < j else j @T.prim_func def expected(A: T.buffer((128, 128), "float32")): for i, j in T.grid(128, 128): A[i, j] = T.if_then_else(i < j, i, j) ``` * Fixing datatype error for gpt-2 (apache#18328) * [CMake][Web] Install `web/` directory in cmake for Python package (apache#18327) This PR updates the CMakeLists to install the web subdirectory when building Python package, so that people do not need to clone TVM source code to build web package. * [Relax][Backend] Implement R.call_py_func operator for calling Python functions from compiled TVM (apache#18326) This PR implements the `R.call_py_func` operator that allows compiled TVM Relax modules to call Python functions at runtime. This enables integration between TVM's compiled code and Python through a robust VM backend implementation. #### Simple Usage with BasePyModule ```python @I.ir_module class MyModule(BasePyModule): @I.pyfunc def torch_relu(self, x): return torch.relu(x) @R.function def forward(x: R.Tensor((10,), "float32")) -> R.Tensor((10,), "float32"): return R.call_py_func("torch_relu", (x,), out_sinfo=R.Tensor((10,), "float32")) ``` #### Direct VM Backend Usage (Manual) ```python # Manually register Python function with VM backend register_func = tvm.get_global_func("vm.builtin.register_py_func") register_func("my_func", my_python_function) # Use in Relax function (compiled to VM backend) @R.function def test(x: R.Tensor((5,), "float32")) -> R.Tensor((5,), "float32"): return R.call_py_func("my_func", (x,), out_sinfo=R.Tensor((5,), "float32")) # Manual cleanup (required for direct VM backend usage) clear_func = tvm.get_global_func("vm.builtin.clear_py_func_registry") clear_func() ``` * [flashinfer] Support directing JIT to FlashInfer GroupedGemm kernels (apache#18325) in tvm/python/tvm/relax/backend/cuda/flashinfer.py added a `gen_grouped_gemm_module` in tvm/tests/python/relax/test_group_gemm_flashinfer.py added tests for different combinations of - input and output types: ("float8_e4m3fn", "float8_e4m3fn", "bfloat16"), ("float8_e4m3fn", "float8_e4m3fn", "float16"), - scale granularity of m, n, k: (1, 128, 128), - scale major mode: "MN", "K" - mma_sm: 1, 2 - different batch sizes and m_sizes * [Relax][Frontend][ONNX] Error converting operator Expand: TVMError: broadcast_to expects the input tensor shape is broadcastable to the target shape (apache#18329) * [Relax] Operator and RoPE support for Llama4 (apache#18336) Added LLama4 implementation, new rope implementation * Fix conflict parameter name promote_dtye in FP8ComputeLegalize (apache#18334) * [FFI][ABI] Bump version ffi to latest (apache#18332) This PR bumps the version of tvm-ffi to latest, which involves an ABI change. * [Relax][PyTorch] Support MatrixMultiply op for ExportedProgram importer (apache#18343) This pr supports `mm.default` for ExportedProgram importer. Resolves the issue apache#18339. * [Relax] Update BasePyModule with faster DLPack converter for tensor conversion (apache#18331) This PR enhances `BasePyModule` by integrating a faster DLPack converter for efficient tensor conversion between TVM and PyTorch following apache#18306. * [TIR] Support sequence comparisons in TVMScript (apache#18341) Implement proper parsing and evaluation of chained comparison operators (e.g., `0 < i < 128`) in TVMScript. The sequence comparisons are now correctly expanded to their logical equivalents (e.g., `(0 < i and i < 128)`). Changes: - Updated expression evaluator to handle sequence comparisons correctly - Added test case to verify sequence comparison functionality * [FFI][ABI] Bump tvm-ffi version to reflect RC ABI Update (apache#18345) This PR bumps tvm-ffi version. The latest version contains a change to the RC ABI that also needs web runtime update. * [Python] Add library lookup path for tvm installed as a pakcage (apache#18348) [Python] Add library lookup path when tvm installed as a pakcage * [FFI][ABI] Bump tvm-ffi to latest (apache#18349) This PR bumps tvm-ffi to latest. Which introduces ShapeView andminimizes TensorObj ABI. * [Relax][Frontend][Torch] Fix parsing error when input dimension of unbind is 1 (apache#18351) * [Relax][Frontend][Torch] Fix parsing error when input dimension of unbind is 1 * reformat code * [Fix] Update ShapeView use in nccl.cc (apache#18352) This PR fixes the use of ShapeView in nccl.cc, which was using `Shape()->Product()`. This has been changed to `Shape().Product()` with the introduction of ShapeView. * [Relax][PyTorch] Support lstm op for ExportedProgram importer (apache#18346) This pr supports `lstm.input` for ExportedProgram importer. This links to issue apache#18340 * [Relax][ONNX] Support AllClassNMS Operator for ONNX Frontend (apache#18321) Follow apache#18175 , this PR supports AllClassNMS Operator for ONNX Frontend * [FFI][ABI] Bump tvm-ffi to latest (apache#18354) This pr bumps the tvm-ffi module to latest * [CUDA] Update FlashInfer JIT integration (apache#18353) Following recent JIT refactor in FlashInfer that uses TVM FFI as the JIT interface, this PR updates the JIT integration of FlashInfer in TVM. Major changes: * we leverage FlashInfer's `JitSpec.build_and_load` to compile all the JIT-generated source files, and remove the compilation logic in TVM. * for efficient tensor buffer management and efficient pointer calculation, we enforced all `byte_offset` fields of auxiliary tensors in KV cache to be zeros. The byte offset is now directly applied to the data pointers. * we also add a new parameter to FlashInfer JIT that controls whether returning a linked shared library, or a list of compiled object paths. For unit tests, returning a shared library is convenient and preferred, while for cases such as MLC model compilation, object files are needed to serialize the compiled model. * rebase && update tvm-ffi * fix(kvcache): restore GPU support lost during rebase --------- Co-authored-by: Tianqi Chen <[email protected]> Co-authored-by: Phoslight <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: wrongtest <[email protected]> Co-authored-by: wrongtest <[email protected]> Co-authored-by: chenxinli <[email protected]> Co-authored-by: Park Woorak <[email protected]> Co-authored-by: ysh329 <[email protected]> Co-authored-by: Yong Wu <[email protected]> Co-authored-by: Chenfan <[email protected]> Co-authored-by: Balint Cristian <[email protected]> Co-authored-by: zyl_keep_moving <[email protected]> Co-authored-by: Wei Wang <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Marcel Dudek <[email protected]> Co-authored-by: Shushi Hong <[email protected]> Co-authored-by: Johnny <[email protected]> Co-authored-by: Henry Hsieh <[email protected]> Co-authored-by: Yaoyao Ding <[email protected]> Co-authored-by: Yaxing Cai <[email protected]> Co-authored-by: Yichen Yan <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Lei Wang <[email protected]> Co-authored-by: Thais Camacho <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Anrui(Henry) Liu <[email protected]> Co-authored-by: Neo Chien <[email protected]> Co-authored-by: Pranav Venkatram <[email protected]> Co-authored-by: Qingchao Shen <[email protected]> Co-authored-by: Masahiro Hiramori <[email protected]> Co-authored-by: Ruxiao Yin <[email protected]> Co-authored-by: cwx <[email protected]>
This PR establishes tvm_ffi as a standalone python module. The ffi is structured as a minimal pip module that can be directly install by path or url.
examples/get_started provided a minimal example.
This is a major change as we are decoupling tvm_ffi as a separate package, users need to install tvm_ffi separately.
Thanks to its minimal dependency, tvm_ffi can be easily installed even just from the source by pip install ./ffi
This change would enable future improvement for library plugins to have lightweight dependencies by just working on top of the tvm_ffi, while the main compiler toolchain and runtime can be layered on top.