Skip to content

v0.4.5

Choose a tag to compare

@mobicham mobicham released this 06 May 07:40
· 351 commits to master since this release
5f88f7b
  • Update caches for 48GB gpus (Qwen2 VL/Llama3 8B)
  • Add cpu-side packing
  • Relax min size to 32
  • fp16 acc fix
  • add persistent SPLIT_K version
  • fix tl.contiguous hint
  • make m,n block sizes safe
  • add BitNet support in helper
  • add custom load_state_dict to allow weight serialization
  • Update swizzle