-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
Description
Train a matched connector (same architecture, hyperparameters, and data volume) without XM3600 in the alignment mix, then compare CVQA scores against the XM3600-augmented model from Issue #17. This isolates whether culturally diverse image-caption pairs (visual domain diversity) improve performance on culturally grounded benchmarks, independent of multilingual text coverage.
Context
- The bulk of visual training data (CC3M, COCO, LLaVA-Pretrain) is Western-centric. Translation diversifies text but not images, creating a structural mismatch with benchmarks like CVQA that evaluate on culturally diverse content from 31 language communities.
- XM3600 (Crossmodal-3600) provides image-caption pairs across 36 languages with geographically diverse imagery.
- This ablation addresses Gap 10 (visual domain diversity in alignment data). Results will be reported as a known limitation analysis rather than a claimed solution.
- The key comparison is per-language CVQA breakdown to identify which languages benefit most from XM3600 augmentation.
Dependencies
- Completed training from Issue Train Adapter and Projection Layers on XM3600-Augmented Alignment Mix #17 (the XM3600-augmented model serves as the treatment).
- Same training pipeline and hyperparameters, just with XM3600 removed from the data mix.
Acceptance Criteria
- A matched connector is trained on the alignment mix minus XM3600, with all other variables held constant (architecture, hyperparameters, total training steps, random seed if feasible).
- Both models (with and without XM3600) are evaluated on CVQA.
- Per-language CVQA scores are reported for both models, highlighting which languages show the largest delta.
- A brief analysis is written: does visual diversity help, and for which language/cultural clusters?
- Results are framed as a limitation analysis (known gap, not a solution).
Estimated Effort
2--3 days (one additional training run + CVQA evaluation + comparison write-up)
Reactions are currently unavailable