Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for separate bias tensors #1250

Open
gabe-l-hart opened this issue Oct 1, 2024 · 2 comments · May be fixed by #1259
Open

Add support for separate bias tensors #1250

gabe-l-hart opened this issue Oct 1, 2024 · 2 comments · May be fixed by #1259

Comments

@gabe-l-hart
Copy link
Contributor

🚀 The feature, motivation and pitch

In the transformers implementation of llama, there are optional bias tensors for the LlamaMLP and LlamaAttention modules. Several additional models (specifically Granite Code 3B and 8B) use the llama architecture and have these separate bias tensors.

The proposal here is to add the ability to indicate the presence of bias tensors in TransformerArgs and then support loading them in Attention and FeedForward

Alternatives

If this project is designed to be limited to official Llama models, these bias tensors are not needed.

Additional context

This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport

RFC (Optional)

I have a working implementation to support these optional bias tensors that I plan to submit as a PR. The changes are along the following lines:

  • Add new parameters to TransformerArgs for attention and ffn bias
  • Set the bias value based on these parameters in both the Attention and FeedForward modules
  • Support mapping .bias tensor names in convert_hf_checkpoint
  • Support permuting .bias tensors in convert_hf_checkpoint
  • Support loading permuted .bias tensors in model.py
@Jack-Khuu
Copy link
Contributor

Love it, if you want to spin up a PR, I'll gladly take a look

@gabe-l-hart gabe-l-hart linked a pull request Oct 3, 2024 that will close this issue
1 task
@gabe-l-hart
Copy link
Contributor Author

Draft PR up: #1259

Currently, I have all of my branches in sequence in order to avoid merge conflicts since many of them touch similar portions of the code (particularly around TransformerArgs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants