-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Investigate the effect of differential learning rates for LoRA matrices A and B during multimodal adapter training. Maya found that using equal learning rates for LoRA A and B at 8B scale led to LoRA failure, while Aya Vision used differential rates successfully. Whether this failure mode reproduces at 3.35B is an open question.
This ablation should run early in Phase 2 alongside the merge ratio sweeps (Issue #18 ) so that findings can inform the remaining training runs. At minimum, compare equal vs. differential LoRA learning rates (e.g., lr_A != lr_B following Aya Vision's configuration) and observe the effect on training stability and downstream performance.
Context
- Maya (arXiv:2412.07112) reported LoRA instability at 8B with equal A/B learning rates.
- Aya Vision applied differential LoRA learning rates as part of their training recipe.
- At 3.35B, LoRA dynamics may differ due to the smaller model having different parameter redundancy characteristics.
- This directly feeds into the Phase 3 LoRA rank sweep (Gap 1).
Dependencies
- Phase 1 infrastructure and training pipeline.
- Should run concurrently with or immediately before Issue Implement Cross-Modal Merging and Ablate Merge Ratios (alpha = 0.3--0.7) #18 (merge ratio ablation), since the best LoRA configuration should be used for the merge sweeps.
Acceptance Criteria
- At least two LoRA learning rate configurations compared: (1) equal A/B rates, (2) differential A/B rates (following Aya Vision's recipe or a principled variant).
- Training stability documented: loss curves, gradient norms, and any divergence/instability observed under equal rates.
- Both configurations evaluated on at least one benchmark to assess downstream impact.
- A clear recommendation is recorded for which LoRA LR configuration to use in subsequent Phase 2 and Phase 3 training.
- Results and rationale documented in a brief write-up or experiment log.
Estimated Effort
1--2 days (2 training runs + comparison)
Reactions are currently unavailable