Skip to content

text capability recovery: weight merging#16

Open
sprnjt wants to merge 2 commits intoCohere-Labs-Community:mainfrom
sprnjt:sarkar-experimental
Open

text capability recovery: weight merging#16
sprnjt wants to merge 2 commits intoCohere-Labs-Community:mainfrom
sprnjt:sarkar-experimental

Conversation

@sprnjt
Copy link

@sprnjt sprnjt commented Mar 8, 2026

Problem:
Adding visual capabilities to Tiny Aya consistently degrades its multilingual text performance on benchmarks like m-ArenaHard and GlobalMGSM. With only 3.35B parameters, Tiny Aya has less capacity redundancy than the 8B/32B models where cross-modal merging has previously been validated.

Too early to merge in main (since it is Phase 1 rn)

Solution:
Implement a weight-merging script that linearly interpolates the fine-tuned VLM's LLM backbone with the original text-only Tiny Aya Base weights.

This lead to the idea of Pareto merging concept (which I did earlier in a research project, basically finding the best sweep or the sweet spot)

merged = (1 - α) × original_weight  +  α × finetuned_weight

The merge ratio α ∈ {0.3, 0.4, 0.5, 0.6, 0.7} is a CLI parameter, enabling a sweep to find the Pareto-optimal point between visual grounding and text task recovery.

What the script does?

Loads the original Tiny Aya base model, and fine tuned VLM checkpoint

Very important: Interpolates only language_model.* parameters — the multi_modal_projector and vision_encoder pass through untouched from the fine-tuned checkpoint

Validates key and shape compatibility before merging
Saves merged_state.pt and optionally a full HF model dir (--save-hf)

Usage:

# Single merge
python scripts/merge_weights.py \
  --original  CohereLabs/tiny-aya-base \
  --finetuned ./checkpoints/tiny-aya-vision-ft \
  --alpha     0.5 \
  --output    ./merged/alpha_0.5

# Sweep (bash)
for alpha in 0.3 0.4 0.5 0.6 0.7; do
  python scripts/merge_weights.py \
    --original CohereLabs/tiny-aya-base \
    --finetuned ./checkpoints/tiny-aya-vision-ft \
    --alpha $alpha \
    --output ./merged/alpha_$alpha
done

@sanggusti
Copy link
Member

#11

Do you save some logs of the run? how does it look like?

@sprnjt
Copy link
Author

sprnjt commented Mar 9, 2026

The merge would happen after the fine tuning. I figured out the merging process and will need to fix the code as per the changes.

@engichang1467 engichang1467 requested a review from sanggusti March 9, 2026 18:22
@engichang1467 engichang1467 linked an issue Mar 9, 2026 that may be closed by this pull request
@engichang1467 engichang1467 self-requested a review March 9, 2026 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Text Capability Recovery

2 participants