question regarding adding vision capabilities #565
Replies: 1 comment
-
|
Great questions — I've been working on something closely related. Q1 (training): Yes, Q2 (deployment swap): Mathematically safe, yes — both represent the same weights. Practically, watch out for one thing: the bf16 and 1.58-bit variants may have subtly different numerical ranges in intermediate activations, especially at the adapter boundary (the projection MLP output feeds into the first decoder layer). Worth running a quick cosine similarity check on a few outputs before and after the swap to confirm alignment. For reference: we built a working SigLIP → projector → KV injection pipeline on top of BitNet b1.58-2B-4T in a from-scratch C engine. The architecture follows: image → SigLIP ViT encoder → 2-layer MLP projector → soft-prefix tokens injected into the KV cache before the text decoder runs. The full pipeline is implemented and functional, though full image understanding requires a multimodal-trained backbone (BitNet as-is is text-only, as you noted). Implementation is at https://github.com/shifulegend/project-zero if the code is useful to reference. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am building a Vision-Language Model (VLM) using BitNet b1.58 as the frozen text decoder, with a lightweight adapter module connecting a vision encoder to the decoder.
I have two questions regarding the choice of model variant:
For training: I am currently using
microsoft/bitnet-b1.58-2B-4T-bf16as the decoder backbone, keeping its weights frozen and only training the adapter. Is this the correct variant for this use case?For deployment: Once the adapter is trained, would it be safe to swap the bf16 decoder for
microsoft/bitnet-b1.58-2B-4T(the packed 1.58-bit variant) without retraining, given that both variants represent the same underlying model mathematically?Thank you for your time and for open-sourcing this work.
Beta Was this translation helpful? Give feedback.
All reactions