v0.1.0
What's Changed
- Port the batched matmul example. by @maleadt in #8
- IRStructurizer: Switch to
code_ircodeby @maleadt in #6 - Add and clean-up intrinsics by @maleadt in #12
- Lay out SSAArray as a StructOfArrays by @vchuravy in #5
- Deduce view index type from array fields. by @maleadt in #17
- Permute during reshape to support column-major storage. by @maleadt in #18
- Simplify type extraction for tile-only intrinsics. by @maleadt in #19
- Validate tile sizes to be pow2. by @maleadt in #20
- Fix index calculation in 2D gather/scatter by @AntonOresten in #23
- Add initial CI (only codegen tests) by @maleadt in #28
- Simplify constant emission. by @maleadt in #29
- Fix FFT example and add benchmark. by @maleadt in #30
- Expose entry hints through
launchby @AntonOresten in #27 - Remove bad matmul-related overrides. by @maleadt in #33
- Expose load/store optimization hints by @AntonOresten in #32
- Refactor examples by @maleadt in #35
- Add UInt8 support to julia_to_tile_dtype by @arhik in #38
- Allow redefinition of kernel methods by @AntonOresten in #31
- Remove Int64 from
encode_signed_varint!signature by @arhik in #42 - replace unsigned only and float only varint calls with
encode_varint!by @arhik in #43 - IRStructurizer: handle merge phis for if-then regions by @AntonOresten in #53
- Support BFloat16 by @AntonOresten in #34
- Require terminators + validate them by @maleadt in #54
- Move IRStructurizer and FileCheck subpackages out of the repo. by @maleadt in #55
- Integrate widh CompilerCaching.jl by @maleadt in #46
- Remove unnecessary rtol specifications. by @maleadt in #56
- Add more broadcastable operators. by @maleadt in #57
- Add and shorten some tests by @maleadt in #58
- Support Float8 types by @AntonOresten in #36
- feat: Add integer reduction support for reduce ops by @arhik in #37
- Add support for
PermutedDimsArrayby @AntonOresten in #48 - Add scan (prefix sum) operations support by @arhik in #39
- Switch to an idiomatic reduce/scan API by @maleadt in #60
- Add some more reduction-like operators by @maleadt in #61
- Support
mapand generalizebroadcastby @maleadt in #62 - Compiler simplifications. by @maleadt in #63
- Use Base.transpose. by @maleadt in #64
- Fix and add reflection macros. by @maleadt in #65
- Replace ct.permute with permutedims by @maleadt in #67
- Encode ArraySpec fields as typevars. by @maleadt in #69
- Fix benchmark scripts. by @maleadt in #70
- Make sure constructed ghosts yield SSA values. by @maleadt in #73
- Add
orderkwarg toload/storefor dimension reordering. by @maleadt in #72 - Fix constants and switch intrinsics to constant value inputs +
tfuncs by @maleadt in #79 - Auto-match tile rank in load/store by @AntonOresten in #74
- Emit GlobalRef constants eagerly (#77) by @maleadt in #80
- Add scalar ops on Tile and TileArray. by @maleadt in #76
- Switch to
Base.sizeforTileArrayby @AntonOresten in #75 - Replace astype by idiomatic convert/broadcast. by @maleadt in #81
- Add support for assertions. by @maleadt in #83
- Split tests for better parallelism. by @maleadt in #84
- Rework intrinsics by @maleadt in #85
- Sanitize kernel names. by @maleadt in #86
- Support broadcasting unsafe_trunc and trunc by @maleadt in #88
- Pass constants as scalars, infer as constants. by @maleadt in #87
New Contributors
- @maleadt made their first contribution in #8
- @vchuravy made their first contribution in #5
- @AntonOresten made their first contribution in #23
- @arhik made their first contribution in #38
Full Changelog: https://github.com/JuliaGPU/cuTile.jl/commits/v0.1.0