RPEval: Role-Playing Evaluation for Large Language Models

This repository contains code and data referenced in: "Role-Playing Evaluation for Large Language Models".

Large Language Models (LLMs) demonstrate a notable capacity for adopting personas and engaging in role-playing. However, evaluating this ability presents significant challenges, as human assessments are resource-intensive and automated evaluations can be biased. To address this, we introduce Role-Playing Eval (RPEval), a novel benchmark designed to assess LLM role-playing capabilities across four key dimensions: emotional understanding, decision-making, moral alignment, and in-character consistency.

Getting Started

Clone the repository and install the dependencies:

git clone https://github.com/yelboudouri/RPEval.git
cd RPEval
pip install -r requirements.txt

Reproducing Paper Results

To reproduce the evaluation results from the paper:

python eval.py --responses-file=data/responses_gpt_4o_2024_08_06.jsonl

To test other models, simply change the --responses-file argument to the appropriate file under the data/ directory.

Evaluating a New Model

To run RPEval on a different model:

python eval.py --provider="<provider_name>" --model="<model_name>"

RPEval uses SwitchAI under the hood. Ensure your API key is properly configured and the target model is supported.

Reference

If you use this code in your research, please cite the following paper:

@misc{boudouri2025roleplayingevaluationlargelanguage,
      title={Role-Playing Evaluation for Large Language Models}, 
      author={Yassine El Boudouri and Walter Nuninger and Julian Alvarez and Yvan Peter},
      year={2025},
      eprint={2505.13157},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.13157}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RPEval: Role-Playing Evaluation for Large Language Models

Getting Started

Reproducing Paper Results

Evaluating a New Model

Reference

About

Uh oh!

Releases

Packages

Languages

yelboudouri/RPEval

Folders and files

Latest commit

History

Repository files navigation

RPEval: Role-Playing Evaluation for Large Language Models

Getting Started

Reproducing Paper Results

Evaluating a New Model

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages