Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 605 Bytes

File metadata and controls

22 lines (16 loc) · 605 Bytes

ViVerBench

ViVerBench evaluates whether multimodal models can verify if generated visual outputs satisfy prompt-level constraints.

Overview

  • 3,594 examples across 16 task categories
  • Binary verification target (true / false)
  • Inputs can contain multiple images (1, 2, or 8)

Usage

python -m lmms_eval \
  --model <model_name> \
  --tasks viverbench \
  --batch_size 1 \
  --limit 8