What's New in SYCL*TLA 0.6
SYCL*TLA 0.6 (2025-11-03)
Major Architecture Changes
- Flash Attention Reimplementation (#547 ): Complete rewrite of Flash Attention using new Xe atoms
- Enhanced performance with optimized memory access patterns
- Better integration with Intel Xe hardware capabilities
- CUTLASS Library Generation (#578): Full support for CUTLASS library generation and operations
- New Xe architecture support in library generation pipeline
- Automated kernel instantiation and compilation support
- Python package vi Pypi distribution
- pip install from sycl-tla
Enhancements
-
Python Operations Support (#595): Enhanced Python bindings with comprehensive test coverage
- Improved Python API stability and usability
- Enhanced test framework for Python operations
-
CuTe Subgroup Extensions: New subgroup-scope operations for Intel Xe
-
Enhanced 2D Copy Operations: Extended block 2D copy functionality
-
4-bit VNNI Reorders (#593): New 4-bit unit stride to VNNI reorder operations
-
Batch GEMM with new APIs (#540): Enhanced Batch GEMM with new streamlined APIs
-
Grouped GEMM with new APIs (#574): Enhanced grouped GEMM with new streamlined APIs
See the CHANGELOG for details of all past releases and updates.
SYCL is a trademark of the Khronos Group Inc, Other names and brands may be claimed as the property of others.