Skip to content

Conversation

@vue1999
Copy link
Collaborator

@vue1999 vue1999 commented Nov 3, 2025

This PR adds frozen transfer learning functionality for the fine-tuning of foundation models.
This implementation is based on the original weight-freezing work by @7radians, who also authored the paper, and the partial reimplementation by @SunZichen-2004.

If you use this functionality in your work, please cite the corresponding paper: https://arxiv.org/abs/2502.15582.

@ilyes319
Copy link
Contributor

ilyes319 commented Nov 3, 2025

@vue1999 Thanks Eszter!!

@ilyes319
Copy link
Contributor

ilyes319 commented Nov 6, 2025

@vue1999 Did test that on multi-GPU? i am bit worried about preserve_grad_state causing synchronisation problems.

@vue1999
Copy link
Collaborator Author

vue1999 commented Nov 6, 2025

I'm not sure, but I will check that!

@7radians
Copy link

7radians commented Nov 6, 2025

@ilyes319 @vue1999 @SunZichen-2004 thank you for working on this! The preserve_grad_state is inherited from the og code, there were no problems with multi-gpu previously, but it’s good to check

@vue1999
Copy link
Collaborator Author

vue1999 commented Nov 28, 2025

@ilyes319 Yes, it ran fine on multi-GPU. (Also, I think required_grad is a local variable for each process and is not part of the distributed state).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants