Merged
Conversation
* [Bugfix] Fix the jit_kernel issue * Update README.md --------- Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
…ethod (#1359) This commit refines the Fragment creation process in the InferLayout method of ParallelOpNode. It removes the unnecessary forward_index array and utilizes default fragment indexing for consistency with other operations. Additionally, it binds the thread range to enhance comparability across different operations.
* [Analysis] Enhance NestedLoopChecker with tile op cases * fix tileop issue
* [misc] add a cpp side wrapper for gemm_sp_py * [misc] typing * [IR] bind GemmSPWarpPolicy * [chore] add wrapper code * [IR] fix GemmSPWarpPolicy * [codegen] apply ptxas instructions * [intrinsic] add typical (unused) mma layout * [template] add uint16 debug func * [intrinsic] add b matrix layout * [gemm_sp] enable fp16/bf16 on sm8x * [layout] refactor fp16/bf16 layout * [gemm_sp] enable int8 * [chore] update test case dtype * [gemm_sp] enable fp32 * [layout] refactor layouts * [intrinsic] enable ldmatrix for mat A * [layout] enable ldsm for matrix b * [layout] add ldmatrix for fp32 and fp8 * [chore] refine * [chore] refactor * [chore] add fp8 efactor * [chore] refactor * [chore] add remove negative zero util * [example] add a custom compress kernel * [chore] minor update * [test] refactor gemm_sp test * [refactor] make metadata layout func * [example] add option for using cutlass layout * [doc] add a gemm_sp doc * [doc] minor polish * [chore] remove unused * [bugfix] fix non replicate b case * [test] refactor * [chore] add a check * [bugfix] fix util bug * [wip] init a new test case for v2 * [chore] minor refactor * [chore] minor update * [bugfix] enable 16bit rs * [language] enable rs * [language] enable gemm_sp_sr * [language] enable gemm_sp_rr * [test] enable more tests * [tvm] update ffi binding * [chore] remove print * [chore] fix benchmark script * [lint] precommit lint * [chore] apply feedback * [test] use arch 8.0 * [chore] rollback ::ordered_metadata for backward compatibility * [bugfix] fix captialized * [example] keep gemm_sp on hopper * [test] fix no fp8 normal kernel * [test] reduce matmul size to satisfy accum error * [test] use cal_diff for assertion * [bugfix] expand float8 type * [lib] add make_int4 for short type * [language] add transpose E * [bugfix] fix wrong var * [format] format * [chore] refactor binding * [chore] fix wrong passing var
…#1360) * [Enhancement] Implement dynamic unroll factor in CUDA code generation This commit introduces support for specifying a dynamic unroll factor in the CUDA code generation. The `unroll_factor` map is added to store unroll factors for loop variables, allowing for more flexible and optimized loop unrolling. Additionally, the `unroll` function is integrated into the loop language, enabling users to define unroll factors directly in their code. This enhancement improves performance by allowing tailored unrolling strategies based on specific loop characteristics. * lint fix * [Bugfix] Correct initialization of non-zero counters in custom compress kernel and update TIR registration for gemm_sp_py to use the correct tile operation
updates: - [github.com/pre-commit/mirrors-clang-format: v21.1.2 → v21.1.6](pre-commit/mirrors-clang-format@v21.1.2...v21.1.6) - [github.com/astral-sh/ruff-pre-commit: v0.14.3 → v0.14.7](astral-sh/ruff-pre-commit@v0.14.3...v0.14.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…implify cached library path handling in sparse.py (#1365)
* [Refactor] Update condition for benchmarking in example_gemv.py and simplify cached library path handling in sparse.py * [Enhancement] Extend support for float8 data types in GEMM operations - Updated GEMM operations to recognize additional float8 data types: `float8_e4m3fn` and `float8_e5m2fnuz`. - Refactored condition checks in `checkWgmma` methods to simplify float8 type handling. - Adjusted test cases to ensure compatibility with the new float8 types in tile language examples. * lint fix
There was a problem hiding this comment.
Pull request overview
This PR updates the tilelang library with significant enhancements to sparse tensor operations, loop control features, and infrastructure improvements. Key changes include adding support for sparse GEMM operations (gemm_sp_v2), introducing unroll loop functionality, renaming tile operation namespaces for consistency, and improving cache control mechanisms.
- Added comprehensive sparse tensor core support with gemm_sp_v2 API and custom compression utilities
- Introduced T.unroll loop functionality with step and unroll_factor parameters
- Renamed tile operation namespace from "tl." to "tl.tileop." for better organization
- Enhanced cache control with separate environment variables for global and autotuning cache management
Reviewed changes
Copilot reviewed 70 out of 71 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tilelang/utils/tensor.py | Added float8 dtype detection and negative zero removal utilities |
| tilelang/utils/sparse.py | Enhanced compression support and added randint_semi_sparse generator |
| tilelang/tileop/gemm_sp/*.py | New sparse GEMM implementation with MMA support |
| tilelang/language/loop.py | Added unroll loop with step and factor support |
| tilelang/language/experimental/gemm_sp.py | Added gemm_sp_v2 API for sparse matrix multiplication |
| tilelang/layout/gemm_sp.py | Refactored metadata layout functions with clearer naming |
| tilelang/env.py | Enhanced cache control with separate disable flags |
| src/op/*.cc | Updated tile operation registration to use "tl.tileop" namespace |
| src/target/codegen_cuda.cc | Added pragma unroll factor code generation support |
| testing/python/**/*.py | Added comprehensive test coverage for new features |
Comments suppressed due to low confidence (3)
tilelang/language/loop.py:1
- The error message contradicts the check condition. If unroll_factor is specified, pragma_unroll_explicit should be False (implicit unroll with factor), but the check raises an error when it's True. The message states "must be True" but should state "must be False".
tilelang/layout/gemm_sp.py:1 - Variable name inconsistency: the code uses lowercase
block_kbut the function parameter is uppercaseBlockK. This will cause a NameError at runtime.
tilelang/layout/gemm_sp.py:1 - The function signature indicates it should return
int, but line 132 returns a tuple(offset // k, offset % k). The return type should betuple[int, int]to match the actual implementation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.