v0.6

Latest

Latest

Antonyvance released this 04 Nov 00:48

· 6 commits to main since this release

d2292f0

What's New in SYCL*TLA 0.6

SYCL*TLA 0.6 (2025-11-03)

Major Architecture Changes

Flash Attention Reimplementation (#547 ): Complete rewrite of Flash Attention using new Xe atoms
- Enhanced performance with optimized memory access patterns
- Better integration with Intel Xe hardware capabilities
CUTLASS Library Generation (#578): Full support for CUTLASS library generation and operations
- New Xe architecture support in library generation pipeline
- Automated kernel instantiation and compilation support
Python package vi Pypi distribution
- pip install from sycl-tla

Enhancements

Python Operations Support (#595): Enhanced Python bindings with comprehensive test coverage
- Improved Python API stability and usability
- Enhanced test framework for Python operations
CuTe Subgroup Extensions: New subgroup-scope operations for Intel Xe
- Subgroup broadcast and reduction operations (#9a6aa27)
- make_subgroup_tensor helpers for improved tensor manipulation (#21fb89a)
Enhanced 2D Copy Operations: Extended block 2D copy functionality
- New make_block_2d_copy_{C,D} variants with subtiling support (#48d82e8)
- Support for size-1 fragments in block 2D copies (#2212f1b)
4-bit VNNI Reorders (#593): New 4-bit unit stride to VNNI reorder operations
Batch GEMM with new APIs (#540): Enhanced Batch GEMM with new streamlined APIs
Grouped GEMM with new APIs (#574): Enhanced grouped GEMM with new streamlined APIs

See the CHANGELOG for details of all past releases and updates.

SYCL is a trademark of the Khronos Group Inc, Other names and brands may be claimed as the property of others.

Assets 2