This repository contains data and code associated with:
Thomas Davidson. "Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgment." Nature Human Behaviour.
This repository contains the following core directories:
├── analyses
├── data_processing_scripts
├── figures
├── mllm_experiments
│ ├── gemini
│ ├── gpt4o
│ └── openweights
├── posts_generation_scripts_and_data
│ └── faces
├── replication_data
└── synthetic_posts
├── alt
├── faceless
├── main
└── nameless
analysescontains R scripts necessary to reproduce the results reported in the paper and supplementary information. Depending on the available compute, the bootstrap estimation procedures may take several hours to complete.data_processing_scriptscontains R scripts used to combine the experiment results into the final files used in the analyses.figurescontains the figures generated by the analysis scriptsmllm_experimentscontains the Python scripts and notebooks used to run the MLLM experiments. These are organized into three subdirectories, each containing scripts and aresultsdirectory, as well as code used to concatenate the results from different experiments together.posts_generation_scripts_and_datacontains notebooks used to produce the synthetic posts, as well as the JSON post templates and face images used in the study.replication_datacontains the cleaned and merged experimental data used in the final analyses. This includes the final human subjects experiment data.sythetic_postsis an empty folder designed to store the PNG files used in the experiment. Due to the size of the corpus these are not included here but can be generated by running the notebooks inpost_generation_scripts_and_data.
Additional README.md files are include in various directories with additional information.
It is not necessary to run all of the code to reproduce the findings reported in the paper and supplementary information. To obtain the results reported in the paper, simply run the various scripts in the analyses directory.
- The code in
gpt4oandgeminidirectories ofmllm_experimentsrequires API keys to run and an account linked to a payment method. - The code in
mllm_experiments/openweightsrequires hardware equipped with GPUs and some of the larger models require a cluster with multiple GPUs (see the paper and README in the directory for further details).
The Python scripts and notebooks require Python 3.8 or higher with the following packages:
Core packages:
pandas(>= 2.3.0)numpy(>= 2.3.0)
Image generation:
pillow(>= 10.3.0)pilmoji(>= 2.0.4)emoji(>= 2.9.0)matplotlib(>= 3.7.5)jupyter(>= 1.1.1)
API and async:
google-genai(>= 1.20.0)openai(>= 1.91.0)aiofiles(>= 24.1.0)pyyaml(>= 6.0.2)
Openweights experiments:
torch(>= 2.6.0)transformers(>= 4.52.4)qwen-vl-utils(>= 0.0.14)accelerate(>= 1.8.1)bitsandbytes(>= 0.47.0)torchvision(>= 0.21.0)timm(>= 1.0.16)
Additional requirements:
- API keys for OpenAI and Google models
- CUDA toolkit 11.8 or 12.4+ for GPU support
- Hugging Face account and token for gated models
- See
mllm_experiments/openweights/README.mdfor GPU memory requirements
The R scripts require R 4.0 or higher with the following packages:
Core:
tidyverse(>= 2.0.0)ggplot2(>= 3.5.1)stringr(>= 1.5.1)forcats(>= 1.0.0)
Analysis:
cregg(>= 0.3.7)parallel(>= 4.1.2)
Visualization:
scico(>= 1.5.0)patchwork(>= 1.3.0)cowplot(>= 1.1.3)ggh4x(>= 0.3.0)grid(>= 4.1.2)
Please contact me if you have any questions.