Skip to content

v0.3.0

Latest

Choose a tag to compare

@github-actions github-actions released this 05 May 08:05
· 27 commits to main since this release
b73ce48

cuTile v0.3.0

Diff since v0.2.2

Breaking changes: Replace ct.launch with @cuda backend=cuTile

Merged pull requests:

  • Include (c - a*b) pattern in FMA rewrite pipeline. (#190) (@maleadt)
  • Support for array slicing (#191) (@maleadt)
  • Add generic dataflow framework; port constant & alias analyses (#192) (@maleadt)
  • Add kernel and host RNG (#193) (@maleadt)
  • Stop special-casing TileArray in codegen; add permutedims/transpose/reshape (#194) (@maleadt)
  • Document intrinsics. (#198) (@maleadt)
  • Add transform-side control-flow helpers. (#199) (@maleadt)
  • Add KernelState plumbing for per-launch ambient state. (#200) (@maleadt)
  • Various small fixes (#201) (@maleadt)
  • Test BFloat16 broadcast subtraction. (#202) (@maleadt)
  • Minor fixes given the Tile IR spec (#203) (@maleadt)
  • Improve Random.jl coverage: randn and randexp (#204) (@maleadt)
  • Extend assumption analysis: divisibility, bounds, no-wrap (#205) (@maleadt)
  • Add lightweight CSE on StructuredIRCode. (#207) (@maleadt)
  • Update benchmarks (#208) (@maleadt)
  • Benchmark harness improvements (#210) (@maleadt)
  • Recompute IR flags from efunc when the rewriter changes opcodes. (#211) (@maleadt)
  • Fold contiguous-axis stride in gather/scatter offset chains (#212) (@maleadt)
  • Add rewrite rule to drop contiguous-axis stride in scatter/gather offsets + unified AssumeOp injection (#213) (@maleadt)
  • Integrate with CUDA.jl + reduce launch overhead (#214) (@maleadt)
  • Fix UndefVarError in README Quick Start (#215) (@AntonOresten)
  • TTFX improvements (#216) (@maleadt)
  • Backport fixes from cuTile Python (#218) (@maleadt)
  • Suppress divby assume on tile-of-pointers from offset. (#219) (@maleadt)
  • Run normalization rewrites to fixpoint before FMA fusion. (#220) (@maleadt)
  • Update benchmark timings. (#221) (@maleadt)

Closed issues:

  • Random Number Generation (#189)
  • Consider CSE/LICM for TileView construction (#195)
  • Make assumption emission into a pass (#196)
  • Emit DivBy assumptions (#197)