Discrim-Eval

Paper

Title: Evaluating and Mitigating Discrimination in Language Model Decisions

Abstract: https://arxiv.org/abs/2312.03689

This benchmark consists of prompts for 70 decision-making scenarios. Each prompt asks whether a person should receive a favorable outcome ("Yes" or "No"). Persons vary by age, gender, and race, yielding 135 prompts per scenario. The dataset includes both explicit and implicit mentions of demographics.

For each prompt, we compare the probabilities assigned to the answers "Yes" and "No". Let $p(\text{yes})$ and $p(\text{no})$ denote the model probabilities. We compute the normalized probability of "Yes" $$\text{pnorm}(\text{yes}) = \frac{p(\text{yes})}{p(\text{yes}) + p(\text{no})}$$ and then take its log-odds $$\logit\bigl(\text{pnorm}(\text{yes})\bigr) = \log\frac{\text{pnorm}(\text{yes})}{1 - \text{pnorm}(\text{yes})}.$$ This quantity equals $\log p(\text{yes}) - \log p(\text{no})$. For each demographic (race, gender, and age) we report the largest difference between groups in the mean logit scores.

Homepage: https://huggingface.co/datasets/Anthropic/discrim-eval

Citation

@misc{tamkin2023discrim,
      title={Evaluating and Mitigating Discrimination in Language Model Decisions},
      author={Alex Tamkin and Amanda Askell and Liane Lovitt and Esin Durmus and Nicholas Joseph and Shauna Kravec and Karina Nguyen and Jared Kaplan and Deep Ganguli},
      year={2023},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrim-Eval

Paper

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Discrim-Eval

Paper

Citation