v0.8.4rc1
What's Changed
- [Kernel][VIP] support cuda merge_attn_states kernel, max ~3x improved by @DefTruth in #18
- [Kernel][VIP] support cuda merge_attn_states kernel by @DefTruth in #19
- [Kernel][VIP] dispatch merge_attn_states cuda kernel, half&bf16 by @DefTruth in #20
- [Misc][VIP] Revert to original Tritron merge_attn_states kernel by @DefTruth in #21
Full Changelog: v0.8.4rc0...v0.8.4rc1