Skip to content

[Dataset] Empathetic Dialogues #317

@anngedeus

Description

@anngedeus

A collection of open-domain conversations annotated with emotion labels, designed to study empathetic response generation. The corpus contains 25k conversations (about 81k utterances) between two crowdworkers, where one speaker describes a personal situation grounded in a specific emotion and the other responds.

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset
Hannah Rashkin, Eric Michael Smith, Margaret Li, Y-Lan Boureau. ACL 2019.
Paper Link

Dataset details

  • Number of Speakers: 796
  • Number of Utterances: 88500
  • Number of Conversations: 23063

Speaker-level information

Speakers in this dataset are the two dialogue participants in each EmpatheticDialogues conversation.

  • In the original dataset they are indexed as "0" and "1" (per conversation).
  • These indices are used as the speaker IDs in the ConvoKit corpus.

Utterance-level information

Each conversational turn is represented as an utterance. For each utterance, we provide:

  • id: unique identifier for the utterance
  • speaker: the speaker who produced the utterance
  • conversation_id: the identifier of the conversation this utterance belongs to
  • reply_to: id of the preceding utterance in the conversation (or None if it is the first utterance)
  • timestamp: always None or Null (no timestamps are provided in EmpatheticDialogues)
  • text: textual content of the utterance

Metadata for each utterance include:

  • selfeval: self-evaluation string with numerical ratings from the dataset
  • tags: extra tags if present (usually empty)
  • utterance_idx: the turn index of the utterance within its conversation
  • parsed: parsed version of the utterance text, represented as a SpaCy Doc
  • split: dataset partition the utterance came from (train, valid, or test)

Conversation-level information

Each conversation corresponds to one dialogue in EmpatheticDialogues.

  • id: identical to the original field
  • meta:
    • label: the emotion label for the conversation (from context in the csv)
    • situation: the one-sentence scenario provided by the Speaker (from prompt in the csv).

Contact Information

Ann-Kareen Gedeus, [email protected]

convokit_data.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions