[Dataset] Empathetic Dialogues

A collection of open-domain conversations annotated with emotion labels, designed to study empathetic response generation. The corpus contains 25k conversations (about 81k utterances) between two crowdworkers, where one speaker describes a personal situation grounded in a specific emotion and the other responds.

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset
Hannah Rashkin, Eric Michael Smith, Margaret Li, Y-Lan Boureau. ACL 2019.
[Paper Link](https://arxiv.org/abs/1811.00207)

## Dataset details

- Number of Speakers: 796
- Number of Utterances: 88500
- Number of Conversations: 23063

### Speaker-level information
Speakers in this dataset are the two dialogue participants in each EmpatheticDialogues conversation.  
- In the original dataset they are indexed as `"0"` and `"1"` (per conversation).  
- These indices are used as the speaker IDs in the ConvoKit corpus.  

### Utterance-level information
Each conversational turn is represented as an utterance. For each utterance, we provide:

- **id**: unique identifier for the utterance 
- **speaker**: the speaker who produced the utterance  
- **conversation_id**: the identifier of the conversation this utterance belongs to  
- **reply_to**: id of the preceding utterance in the conversation (or `None` if it is the first utterance) 
- **timestamp**: always `None` or `Null` (no timestamps are provided in EmpatheticDialogues) 
- **text**: textual content of the utterance

Metadata for each utterance include:

- **selfeval**: self-evaluation string with numerical ratings from the dataset  
- **tags**: extra tags if present (usually empty) 
- **utterance_idx**: the turn index of the utterance within its conversation 
- **parsed**: parsed version of the utterance text, represented as a SpaCy Doc
- **split**: dataset partition the utterance came from (`train`, `valid`, or `test`)  

### Conversation-level information
Each conversation corresponds to one dialogue in EmpatheticDialogues.

- **id**: identical to the original field  
- **meta**: 
    - **label**: the emotion label for the conversation (from `context` in the csv)
    - **situation**: the one-sentence scenario provided by the Speaker (from `prompt` in the csv).

## Contact Information
 Ann-Kareen Gedeus, ag2637@cornell.edu

[convokit_data.zip](https://github.com/user-attachments/files/22307680/convokit_data.zip)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dataset] Empathetic Dialogues #317

Dataset details

Speaker-level information

Utterance-level information

Conversation-level information

Contact Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Dataset] Empathetic Dialogues #317

Description

Dataset details

Speaker-level information

Utterance-level information

Conversation-level information

Contact Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions