text capability recovery: weight merging by sprnjt · Pull Request #16 · Cohere-Labs-Community/expedition-tayavision

sprnjt · 2026-03-08T20:10:46Z

Problem:
Adding visual capabilities to Tiny Aya consistently degrades its multilingual text performance on benchmarks like m-ArenaHard and GlobalMGSM. With only 3.35B parameters, Tiny Aya has less capacity redundancy than the 8B/32B models where cross-modal merging has previously been validated.

Too early to merge in main (since it is Phase 1 rn)

Solution:
Implement a weight-merging script that linearly interpolates the fine-tuned VLM's LLM backbone with the original text-only Tiny Aya Base weights.

This lead to the idea of Pareto merging concept (which I did earlier in a research project, basically finding the best sweep or the sweet spot)

merged = (1 - α) × original_weight  +  α × finetuned_weight

The merge ratio α ∈ {0.3, 0.4, 0.5, 0.6, 0.7} is a CLI parameter, enabling a sweep to find the Pareto-optimal point between visual grounding and text task recovery.

What the script does?

Loads the original Tiny Aya base model, and fine tuned VLM checkpoint

Very important: Interpolates only language_model.* parameters — the multi_modal_projector and vision_encoder pass through untouched from the fine-tuned checkpoint

Validates key and shape compatibility before merging
Saves merged_state.pt and optionally a full HF model dir (--save-hf)

Usage:

# Single merge
python scripts/merge_weights.py \
  --original  CohereLabs/tiny-aya-base \
  --finetuned ./checkpoints/tiny-aya-vision-ft \
  --alpha     0.5 \
  --output    ./merged/alpha_0.5

# Sweep (bash)
for alpha in 0.3 0.4 0.5 0.6 0.7; do
  python scripts/merge_weights.py \
    --original CohereLabs/tiny-aya-base \
    --finetuned ./checkpoints/tiny-aya-vision-ft \
    --alpha $alpha \
    --output ./merged/alpha_$alpha
done

sanggusti · 2026-03-09T01:59:46Z

#11

Do you save some logs of the run? how does it look like?

sprnjt · 2026-03-09T03:03:44Z

The merge would happen after the fine tuning. I figured out the merging process and will need to fix the code as per the changes.

text capability recovery: weight merging

8f3700e

engichang1467 requested a review from sanggusti March 9, 2026 18:22

engichang1467 assigned sprnjt Mar 9, 2026

engichang1467 linked an issue Mar 9, 2026 that may be closed by this pull request

[Feature]: Text Capability Recovery #11

Open

engichang1467 self-requested a review March 9, 2026 18:37

Merge remote-tracking branch 'upstream/HEAD' into sarkar-experimental

d51ea6a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text capability recovery: weight merging#16

text capability recovery: weight merging#16
sprnjt wants to merge 2 commits intoCohere-Labs-Community:mainfrom
sprnjt:sarkar-experimental

sprnjt commented Mar 8, 2026

Uh oh!

sanggusti commented Mar 9, 2026

Uh oh!

sprnjt commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sprnjt commented Mar 8, 2026

Uh oh!

sanggusti commented Mar 9, 2026

Uh oh!

sprnjt commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants