[Feature]: Evaluate Vision-Independent Performance Floor (Blind Baselines)

### Problem Statement

Many VQA benchmarks are partially solvable without images. We need to prevent conflating language model priors with actual multimodal capability.

### Proposed Solution

Evaluate Tiny Aya Base (text-only, no image input) on all primary multimodal benchmarks (CVQA, XMMMU, Kaleidoscope, MaXM, MTVQA) before any training begins to establish the vision-independent performance floor.

### Use Case

This is a critical Phase 1 testing step. We will calculate and report the vision gain ($A\_vision$) by subtracting this blind baseline score from the final Tiny Aya Vision score per benchmark.

### Alternatives Considered

Skipping blind baselines and assuming standard benchmarks perfectly isolate visual capabilities, which recent literature on MMMU and VQAv2 has proven false.

### Additional Context

This directly addresses Gap 9 from our literature review regarding the blind solvability of multilingual multimodal benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Evaluate Vision-Independent Performance Floor (Blind Baselines) #12

Problem Statement

Proposed Solution

Use Case

Alternatives Considered

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Evaluate Vision-Independent Performance Floor (Blind Baselines) #12

Description

Problem Statement

Proposed Solution

Use Case

Alternatives Considered

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions