Skip to content

v0.2.0

Choose a tag to compare

@oulgen oulgen released this 20 Oct 20:54
· 63 commits to main since this release
3a0e975

What's Changed

  • Verify compiled kernels in subprocess by @jansel in #914
  • Auto-shrink autotune_precompile_jobs based on free memory by @jansel in #940
  • Make HELION_FORCE_AUTOTUNE or kernel.autotune() skip the cache by @jansel in #930
  • Support warp specialization on B200 by @oulgen in #935
  • Update README.md by @oulgen in #943
  • Register tile symbol origin, to support tile + offset use case in blackwell attention by @yf225 in #939
  • [CI] Print failed tests by @oulgen in #942
  • Update examples to use run_example by @jansel in #941
  • blackwell attn with triton attr set by @v0i0 in #918
  • Set static_shapes=True by @oulgen in #937
  • run.py env var to skip exception logging by @v0i0 in #946
  • Fix bug with unit sized dims and block_sizes by @jansel in #932
  • Update static_shapes docs by @jansel in #951
  • Add tile.count by @oulgen in #955
  • Auto detect low vram by @oulgen in #956
  • [CI] Use official PyTorch 2.9 by @oulgen in #962
  • Use interleaved_bench for run_example by @jansel in #945
  • Generalize tile_with_offset pass by @jansel in #949
  • Docstring updates by @jansel in #952
  • Import updates by @jansel in #953
  • Add missing environment variables to docs by @jansel in #957
  • Print out errors vs timeouts in autotuning status by @jansel in #960
  • Add HELION_AUTOTUNE_IGNORE_ERRORS by @jansel in #961
  • Exit autotuning faster on KeyboardInterrupt by @jansel in #963
  • Remove default settings by @jansel in #964
  • Add missing settings environment variables by @jansel in #965
  • Skip test_differential_evolution_search due to slowness by @jansel in #968
  • [Benchmark CI] Give nightly job permissions by @oulgen in #970
  • [Benchmark CI] Allow kicking off workflow dispatch by @oulgen in #971
  • [Benchmark CI] Allow specifying custom env vars via UI by @yf225 in #972
  • [blackwell attn example] qk scale as param by @v0i0 in #969
  • [Benchmark CI] Allow specifying custom args to benchmark runner via UI by @yf225 in #974
  • Add initial backwards compatibility tests by @oulgen in #958
  • Remove unrolling + warp spec by @PaulZhang12 in #967
  • [Benchmark CI] Set atol and rtol to 1e-2 by @yf225 in #976
  • [Benchmark] Fix tritonbench auto-installation by @yf225 in #980
  • [Autotuner] Fix fork-based autotuner to avoid re-initializing CUDA context in subprocess by @yf225 in #981
  • Make fork default precompilation strategy by @oulgen in #979
  • [benchmarks] change tritonbench path by @xuzhao9 in #966
  • Add skipIfA10G decorator by @yf225 in #982
  • Suggest HELION_AUTOTUNE_PRECOMPILE=spawn when IMA happens by @jansel in #984
  • Layer Norm bwd kernel to support large B*M case used by internal by @yf225 in #973
  • Fix timeouts in autotuning by @jansel in #985
  • Log generated triton code at the DEBUG level rather than INFO by @jansel in #986
  • Remove extra debug log for timeouts by @jansel in #987
  • Add squeeze_and_excitation_net kernel by @mengluy0125 in #870
  • Generalize test cases to support XPU by @EikanWang in #983
  • Updated README with News section of upcoming events. Added link to GPU mode talk. by @choijon5 in #991
  • Update README.md by @oulgen in #992
  • Update README.md by @oulgen in #993
  • Mamba2 Chunk Scan & State by @v0i0 in #950
  • Remove unrolling with tma + pipelining by @PaulZhang12 in #994
  • Add provenance annotations to output code by @jansel in #988

Full Changelog: v0.1.8...v0.2.0