Skip to content

XM3600 Ablation: Train Connector Without XM3600 to Isolate Visual Diversity Contribution #21

@engichang1467

Description

@engichang1467

Train a matched connector (same architecture, hyperparameters, and data volume) without XM3600 in the alignment mix, then compare CVQA scores against the XM3600-augmented model from Issue #17. This isolates whether culturally diverse image-caption pairs (visual domain diversity) improve performance on culturally grounded benchmarks, independent of multilingual text coverage.

Context

  • The bulk of visual training data (CC3M, COCO, LLaVA-Pretrain) is Western-centric. Translation diversifies text but not images, creating a structural mismatch with benchmarks like CVQA that evaluate on culturally diverse content from 31 language communities.
  • XM3600 (Crossmodal-3600) provides image-caption pairs across 36 languages with geographically diverse imagery.
  • This ablation addresses Gap 10 (visual domain diversity in alignment data). Results will be reported as a known limitation analysis rather than a claimed solution.
  • The key comparison is per-language CVQA breakdown to identify which languages benefit most from XM3600 augmentation.

Dependencies

Acceptance Criteria

  • A matched connector is trained on the alignment mix minus XM3600, with all other variables held constant (architecture, hyperparameters, total training steps, random seed if feasible).
  • Both models (with and without XM3600) are evaluated on CVQA.
  • Per-language CVQA scores are reported for both models, highlighting which languages show the largest delta.
  • A brief analysis is written: does visual diversity help, and for which language/cultural clusters?
  • Results are framed as a limitation analysis (known gap, not a solution).

Estimated Effort

2--3 days (one additional training run + CVQA evaluation + comparison write-up)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions