Skip to content

Fix Llama4ForConditionalGeneration vision tower template. #560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 20, 2025

Conversation

perinmclaughlin
Copy link
Contributor

Fix merging Llama 4 by fixing the naming scheme for the weights and biases of the vision tower.
The huggingface safetensors use 'model.layers.${layer_index}.self_attn' while the template specifies 'model.layers.${layer_index}.attention'. This causes an error because mergekit cannot find the misnamed layers when merging.

Copy link

github-actions bot commented Apr 19, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@perinmclaughlin
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@cg123
Copy link
Collaborator

cg123 commented Apr 20, 2025

Good catch, thank you for the PR!

@cg123 cg123 merged commit 8440b18 into arcee-ai:main Apr 20, 2025
5 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 20, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants