Skip to content

v1.12.1 release

Choose a tag to compare

@Anerudhan Anerudhan released this 09 Jul 00:43
· 10 commits to main since this release
f937055

This release builds on top of the 1.12.0 release.

Bug fix

  • Fixes an issue where d=256 was marked not supported in Hopper

Minor Enhancements

  • Addressed several comments from code review.
  • Improved the cmake workflow. See PR 125

Benchmark Results

  • Published results of using cuDNN backend for default torch.sdpa op in comparison to other backend. See Llama-3.2-1B-Training for reference.
  • Published comparison results of sdpa() in comparison to other backends. See sdpa_benchmark_bf16_training