🌀 CHAOS: Chart Analysis with Outlier Samples

ℹ️ About

What happens when the input is messy—blurred labels, typos, occlusions, or color shifts? 🤔 CHAOS (CHart Analysis with Outlier Samples) is the first benchmark purposely designed to stress‑test MLLMs under realistic noise. We:

evaluate 10 visual and 5 textual perturbations, each at three increasing severity levels (easy → mid → hard);
span 112,500 perturbed charts (2️⃣ 5️⃣ 0️⃣ 0️⃣ per perturbation × 3 levels × 15 types);
introduce a Robustness Score that unifies vision‑ and text‑side degradations for apples‑to‑apples model comparison.

Our goal is simple: measure how and understand why gracefully MLLMs fail—and, ideally, still succeed—when reality gets noisy.

🚀 Getting Started

Clone the repo with submodules:

git clone --recurse-submodules https://github.com/moured/CHAOS
cd CHAOS

Create the environment (Python 3.10 recommended):

conda create -n chaos python=3.10
conda activate chaos

Install dependencies (you can use a different torch version — in our case we experimented with torch==2.6.0):

cd VLMEvalKit
pip install -e .
pip install accelerate qwen-vl-utils
pip install flash-attn --no-build-isolation
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

Copy custom CHAOS dataset files:

cp ../custom_files/* ./VLMEvalKit/vlmeval/dataset/

🧪 Evaluation

Run with a single GPU:

python run.py --data CHAOS_text --model Qwen2.5-VL-7B-Instruct --verbose

Run with multiple GPUs:

torchrun --nproc-per-node=4 run.py --data CHAOS_text --model Qwen2.5-VL-7B-Instruct --verbose

You can experiment with different models — please check the VLMEvalKit repository for a list of supported models.

📊 Robustness Metrics

TBD

📚 Citation

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
VLMEvalKit @ 7edc329		VLMEvalKit @ 7edc329
custom_files		custom_files
misc		misc
.gitmodules		.gitmodules
LICENSE		LICENSE
PertubChartQA.py		PertubChartQA.py
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌀 CHAOS: Chart Analysis with Outlier Samples

ℹ️ About

🚀 Getting Started

🧪 Evaluation

📊 Robustness Metrics

📚 Citation

About

Uh oh!

Releases

Packages

Languages

License

moured/CHAOS

Folders and files

Latest commit

History

Repository files navigation

🌀 CHAOS: Chart Analysis with Outlier Samples

ℹ️ About

🚀 Getting Started

🧪 Evaluation

📊 Robustness Metrics

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages