-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
With the current setup, the same learning rate is applied to non gain or bias params of the text and image encoders. It would be nice to have flexibility in setting these. For instance, the SigLIP paper gets peak performance with pretrained image encoders by disabling weight decay on the image encoder (though I'm not sure if that's the trunk, head, or both). Here's the figure from the paper for reference:

I'm not sure what the best mechanism to accomodate various use cases would be. One more useful fine-tuning setup I can imagine is setting differential learning rates for diff parts of the network.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels