-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Problem Statement
Adding visual capabilities to LLMs consistently causes degradation in base text tasks like translation and mathematical reasoning, which is especially critical given Tiny Aya's smaller parameter redundancy.
Proposed Solution
Build a merging script to interpolate the multimodal fine-tuned weights with the original text-only Tiny Aya weights. The script must accept a tunable merge ratio parameter (
Use Case
This Phase 2/3 task is required to finalize the model weights, aiming to recover text performance on benchmarks like m-ArenaHard and GlobalMGSM while maintaining the newly acquired visual grounding.
Alternatives Considered
- RMAdapter: Training with a dual-branch adapter that uses separate discrimination and reconstruction paths to enforce consistency and prevent forgetting natively.
Additional Context
Validating cross-modal merging at the 3.35B scale is a core novelty gap for the project, as previous literature has only proven its efficacy at 8B and 32B scales