Description
A new SOTA bitnet model, Bonsai 0.5B, has come out. Seems to outperform larger bitnet models like Falcon 1B, 3B, TriLM 700M. Seems like they are going to release a new line of bitnet models which is really exciting.
Support is needed for these models. They adopt a channel wise scaling factor compared to the tensor level ones. Maybe a separate kennel can be built to apply scales outside of the matmul kernels? Probably would yield similar inference speeds. Note that the hugging face does have a custom Q-linear layer that applies the scales.
HF: https://huggingface.co/deepgrove/Bonsai
Seems super promising.
pinging @Eddie-Wang1120 + other kernels writers
Other posts and information:
https://www.reddit.com/r/LocalLLaMA/comments/1jgkqio/new_bitnet_model_from_deepgrove/
https://x.com/deepgrove_ai/status/1903103798735761518