More Flexibility In Setting Learning Rates

With the current setup, the same learning rate is applied to non gain or bias params of the text and image encoders. It would be nice to have flexibility in setting these. For instance, the [SigLIP paper](https://arxiv.org/abs/2303.15343) gets peak performance with pretrained image encoders by disabling weight decay on the image encoder (though I'm not sure if that's the `trunk`, `head`, or both). Here's the figure from the paper for reference:
<img width="350" alt="CleanShot 2023-10-25 at 17 30 28@2x" src="https://github.com/mlfoundations/open_clip/assets/17111474/d8174a33-23fa-453f-a4f0-a82feab44405">


I'm not sure what the best mechanism to accomodate various use cases would be. One more useful fine-tuning setup I can imagine is setting differential learning rates for diff parts of the network.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Flexibility In Setting Learning Rates #707

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

More Flexibility In Setting Learning Rates #707

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions