Skip to content

[Feature]: Text Capability Recovery #11

@sanggusti

Description

@sanggusti

Problem Statement

Adding visual capabilities to LLMs consistently causes degradation in base text tasks like translation and mathematical reasoning, which is especially critical given Tiny Aya's smaller parameter redundancy.

Proposed Solution

Build a merging script to interpolate the multimodal fine-tuned weights with the original text-only Tiny Aya weights. The script must accept a tunable merge ratio parameter ($\alpha=0.3$ to 0.7) to sweep for the optimal configuration.

Use Case

This Phase 2/3 task is required to finalize the model weights, aiming to recover text performance on benchmarks like m-ArenaHard and GlobalMGSM while maintaining the newly acquired visual grounding.

Alternatives Considered

  • RMAdapter: Training with a dual-branch adapter that uses separate discrimination and reconstruction paths to enforce consistency and prevent forgetting natively.

Additional Context

Validating cross-modal merging at the 3.35B scale is a core novelty gap for the project, as previous literature has only proven its efficacy at 8B and 32B scales

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions