Skip to content

Role-Playing Evaluation for Large Language Models

Notifications You must be signed in to change notification settings

yelboudouri/RPEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RPEval: Role-Playing Evaluation for Large Language Models

HuggingFace Leaderboard

This repository contains code and data referenced in: "Role-Playing Evaluation for Large Language Models".

Large Language Models (LLMs) demonstrate a notable capacity for adopting personas and engaging in role-playing. However, evaluating this ability presents significant challenges, as human assessments are resource-intensive and automated evaluations can be biased. To address this, we introduce Role-Playing Eval (RPEval), a novel benchmark designed to assess LLM role-playing capabilities across four key dimensions: emotional understanding, decision-making, moral alignment, and in-character consistency.

Getting Started

Clone the repository and install the dependencies:

git clone https://github.com/yelboudouri/RPEval.git
cd RPEval
pip install -r requirements.txt

Reproducing Paper Results

To reproduce the evaluation results from the paper:

python eval.py --responses-file=data/responses_gpt_4o_2024_08_06.jsonl

To test other models, simply change the --responses-file argument to the appropriate file under the data/ directory.

Evaluating a New Model

To run RPEval on a different model:

python eval.py --provider="<provider_name>" --model="<model_name>"

RPEval uses SwitchAI under the hood. Ensure your API key is properly configured and the target model is supported.

Reference

If you use this code in your research, please cite the following paper:

@misc{boudouri2025roleplayingevaluationlargelanguage,
      title={Role-Playing Evaluation for Large Language Models}, 
      author={Yassine El Boudouri and Walter Nuninger and Julian Alvarez and Yvan Peter},
      year={2025},
      eprint={2505.13157},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.13157}, 
}

About

Role-Playing Evaluation for Large Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages