Skip to content

kaleidoscope, maxm, mtvqa, xmmmu evals added#25

Open
Rahul007007 wants to merge 1 commit intoCohere-Labs-Community:mainfrom
Rahul007007:trishanu/evals
Open

kaleidoscope, maxm, mtvqa, xmmmu evals added#25
Rahul007007 wants to merge 1 commit intoCohere-Labs-Community:mainfrom
Rahul007007:trishanu/evals

Conversation

@Rahul007007
Copy link
Collaborator

This PR implements Issue #12.

It evaluates Tiny Aya Base (text-only) on:

  • XMMMU
  • Kaleidoscope
  • MaXM
  • MTVQA

The evaluation runs without image inputs to establish a vision-independent performance floor.

Results are included in results/

Fixes #12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Evaluate Vision-Independent Performance Floor (Blind Baselines)

1 participant