You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the transformers implementation of llama, there are optional bias tensors for the LlamaMLP and LlamaAttention modules. Several additional models (specifically Granite Code 3B and 8B) use the llama architecture and have these separate bias tensors.
The proposal here is to add the ability to indicate the presence of bias tensors in TransformerArgs and then support loading them in Attention and FeedForward
Alternatives
If this project is designed to be limited to official Llama models, these bias tensors are not needed.
Additional context
This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport
RFC (Optional)
I have a working implementation to support these optional bias tensors that I plan to submit as a PR. The changes are along the following lines:
Add new parameters to TransformerArgs for attention and ffn bias
Set the bias value based on these parameters in both the Attention and FeedForward modules
Support mapping .bias tensor names in convert_hf_checkpoint
Support permuting .bias tensors in convert_hf_checkpoint
Support loading permuted .bias tensors in model.py
The text was updated successfully, but these errors were encountered:
Currently, I have all of my branches in sequence in order to avoid merge conflicts since many of them touch similar portions of the code (particularly around TransformerArgs).
🚀 The feature, motivation and pitch
In the
transformers
implementation of llama, there are optionalbias
tensors for the LlamaMLP and LlamaAttention modules. Several additional models (specifically Granite Code 3B and 8B) use thellama
architecture and have these separate bias tensors.The proposal here is to add the ability to indicate the presence of bias tensors in TransformerArgs and then support loading them in Attention and FeedForward
Alternatives
If this project is designed to be limited to official Llama models, these bias tensors are not needed.
Additional context
This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport
RFC (Optional)
I have a working implementation to support these optional bias tensors that I plan to submit as a PR. The changes are along the following lines:
TransformerArgs
for attention and ffn biasbias
value based on these parameters in both theAttention
andFeedForward
modules.bias
tensor names inconvert_hf_checkpoint
.bias
tensors inconvert_hf_checkpoint
.bias
tensors inmodel.py
The text was updated successfully, but these errors were encountered: