Hatevolution: What Static Benchmarks Don't Tell Us

TL-DR: We design two novel time-sensitive experiments and associated metrics to empirically evaluate 20 state-of-the-art language models on evolving hate speech, and we show the temporal misalignment between static and time-sensitive evaluations.

Authors: Chiara Di Bonaventura, Barbara McGillivray, Yulan He, Albert Meroño-Peñuela

Contact: [email protected]

📁 Repo Structure

├── 01_Data/          # datasets for Experiment 1, Experiment 2, and Static benchmarks
├── 02_Inference/     # scripts for zero-shot prompting 
├── 03_Eval/          # evaluating models
└── README.md

📎 Citation

@inproceedings{dibonaventura2025hatevolution,
  title={Hatevolution: What Static Benchmarks Don't Tell Us},
  author={Chiara Di Bonaventura and Barbara McGillivray and Yulan He and Albert Meroño-Peñuela},
  booktitle={To appear in the Proceedings of ACL 2025 Findings.},
  year={2025}
}

📌 License

This project is licensed under the CC-BY-4.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hatevolution: What Static Benchmarks Don't Tell Us

📁 Repo Structure

📎 Citation

📌 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
01_Data		01_Data
02_Inference		02_Inference
03_Eval		03_Eval
README.md		README.md

ChiaraDiBonaventura/hatevolution

Folders and files

Latest commit

History

Repository files navigation

Hatevolution: What Static Benchmarks Don't Tell Us

📁 Repo Structure

📎 Citation

📌 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages