-
Notifications
You must be signed in to change notification settings - Fork 137
Description
A collection of open-domain conversations annotated with emotion labels, designed to study empathetic response generation. The corpus contains 25k conversations (about 81k utterances) between two crowdworkers, where one speaker describes a personal situation grounded in a specific emotion and the other responds.
Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset
Hannah Rashkin, Eric Michael Smith, Margaret Li, Y-Lan Boureau. ACL 2019.
Paper Link
Dataset details
- Number of Speakers: 796
- Number of Utterances: 88500
- Number of Conversations: 23063
Speaker-level information
Speakers in this dataset are the two dialogue participants in each EmpatheticDialogues conversation.
- In the original dataset they are indexed as
"0"and"1"(per conversation). - These indices are used as the speaker IDs in the ConvoKit corpus.
Utterance-level information
Each conversational turn is represented as an utterance. For each utterance, we provide:
- id: unique identifier for the utterance
- speaker: the speaker who produced the utterance
- conversation_id: the identifier of the conversation this utterance belongs to
- reply_to: id of the preceding utterance in the conversation (or
Noneif it is the first utterance) - timestamp: always
NoneorNull(no timestamps are provided in EmpatheticDialogues) - text: textual content of the utterance
Metadata for each utterance include:
- selfeval: self-evaluation string with numerical ratings from the dataset
- tags: extra tags if present (usually empty)
- utterance_idx: the turn index of the utterance within its conversation
- parsed: parsed version of the utterance text, represented as a SpaCy Doc
- split: dataset partition the utterance came from (
train,valid, ortest)
Conversation-level information
Each conversation corresponds to one dialogue in EmpatheticDialogues.
- id: identical to the original field
- meta:
- label: the emotion label for the conversation (from
contextin the csv) - situation: the one-sentence scenario provided by the Speaker (from
promptin the csv).
- label: the emotion label for the conversation (from
Contact Information
Ann-Kareen Gedeus, [email protected]