Implement Cross-Modal Merging and Ablate Merge Ratios (alpha = 0.3--0.7)

Implement the cross-modal weight merging strategy from Aya Vision: interpolate between the multimodal-trained weights and the original text-only Tiny Aya Base weights at a tunable ratio alpha. This is the primary mechanism for recovering text-only capabilities that degrade during multimodal training (catastrophic forgetting).

Sweep alpha across at least five values in the range 0.3 to 0.7 and evaluate each on both visual grounding and text retention benchmarks to find the optimal balance.

### Context

- Aya Vision validated cross-modal merging at 8B and 32B scale. Whether it works at 3.35B, where models have less parameter redundancy, is an open question (Gap 3).
- The merge formula is: `W_merged = alpha * W_multimodal + (1 - alpha) * W_text_only`.
- A baseline without merging (alpha = 1.0, pure multimodal weights) should also be evaluated to isolate the merging contribution.

### Dependencies
- Trained checkpoint from Issue #17 (adapter + projection alignment)
- Original Tiny Aya Base weights (the text-only reference for merging).

## Acceptance Criteria

- [ ] Merging script is implemented and tested: given a multimodal checkpoint and the text-only base, produces a merged checkpoint at a specified alpha.
- [ ] At least 5 merge ratios evaluated: alpha in {0.3, 0.4, 0.5, 0.6, 0.7}, plus the no-merge baseline (alpha = 1.0).
- [ ] Each merged checkpoint is evaluated on at least one visual grounding benchmark (CVQA or xMMMU) and one text retention benchmark (m-ArenaHard or GlobalMGSM).
- [ ] Results are recorded in a table: alpha vs. visual grounding score vs. text retention score.
- [ ] Optimal alpha (or alpha range) is identified and documented, with a brief note on the tradeoff curve.

## Estimated Effort

2--3 days (merging implementation + 6 evaluation runs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Cross-Modal Merging and Ablate Merge Ratios (alpha = 0.3--0.7) #18

Context

Dependencies

Acceptance Criteria

Estimated Effort

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Cross-Modal Merging and Ablate Merge Ratios (alpha = 0.3--0.7) #18

Description

Context

Dependencies

Acceptance Criteria

Estimated Effort

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions