Skip to content

v0.6

Latest

Choose a tag to compare

@Antonyvance Antonyvance released this 04 Nov 00:48
· 6 commits to main since this release
d2292f0

What's New in SYCL*TLA 0.6

SYCL*TLA 0.6 (2025-11-03)

Major Architecture Changes

  • Flash Attention Reimplementation (#547 ): Complete rewrite of Flash Attention using new Xe atoms
    • Enhanced performance with optimized memory access patterns
    • Better integration with Intel Xe hardware capabilities
  • CUTLASS Library Generation (#578): Full support for CUTLASS library generation and operations
    • New Xe architecture support in library generation pipeline
    • Automated kernel instantiation and compilation support
  • Python package vi Pypi distribution

Enhancements

  • Python Operations Support (#595): Enhanced Python bindings with comprehensive test coverage

    • Improved Python API stability and usability
    • Enhanced test framework for Python operations
  • CuTe Subgroup Extensions: New subgroup-scope operations for Intel Xe

    • Subgroup broadcast and reduction operations (#9a6aa27)
    • make_subgroup_tensor helpers for improved tensor manipulation (#21fb89a)
  • Enhanced 2D Copy Operations: Extended block 2D copy functionality

    • New make_block_2d_copy_{C,D} variants with subtiling support (#48d82e8)
    • Support for size-1 fragments in block 2D copies (#2212f1b)
  • 4-bit VNNI Reorders (#593): New 4-bit unit stride to VNNI reorder operations

  • Batch GEMM with new APIs (#540): Enhanced Batch GEMM with new streamlined APIs

  • Grouped GEMM with new APIs (#574): Enhanced grouped GEMM with new streamlined APIs

    See the CHANGELOG for details of all past releases and updates.

SYCL is a trademark of the Khronos Group Inc, Other names and brands may be claimed as the property of others.