Skip to content

Question Evasion Dataset #308

@BolinSong833

Description

@BolinSong833

Add "Question Evasion" dataset as a ConvoKit-formatted corpus

Summary

Add a ConvoKit-formatted version of the Question Evasion dataset, which contains political interview question and answer pairs annotated for response clarity and evasion. The corpus supports studies of how clearly an answer addresses a question and enables testing of feature extractors and classifiers that use conversational context.

Source and citation

Thomas, Filandrianos, Lymperaiou, Zerva, Stamou (2024). "I Never Said That": A dataset, taxonomy and baselines on response clarity classification. arXiv:2409.13879.

ConvoKit mapping and files

Unit of analysis

  • One conversation per single sub-question.
  • Each conversation has two utterances: interviewer question and president answer.

IDs

  • conversation_id equals the first utterance id (the question).
  • The answer’s reply_to points to the question id.

Conversation metadata

  • title, date, url, president, question_order

Utterance metadata

  • Question: type="question", question_order, interview_question_raw, gpt35_summary, title, date, url
  • Answer: type="answer", question_order, label, annotator_id, inaudible, multiple_questions, affirmative_questions, interview_answer_raw, gpt35_prediction

Included files

  • utterances.jsonl
  • speakers.json
  • conversations.json
  • corpus.json
  • index.json
  • README.md
  • conversion_and_analysis.ipynb

Basic stats for the converted corpus

  • Conversations: 3,448
  • Utterances: 6,896
  • Speakers: 5

Label counts for answers

  • Explicit: 1,052
  • Dodging: 706
  • Implicit: 488
  • General: 386
  • Deflection: 381
  • Declining to answer: 145
  • Claims ignorance: 119
  • Clarification: 92
  • Partial or half answer: 79

Maintainers for this contribution

Data access

Dataset with Demo: https://drive.google.com/drive/folders/15EtaQTX3m1gwvQZxTV8PjedfIN67PNTB?usp=sharing

Slides Summary

https://docs.google.com/presentation/d/1pMLcdCikCMH5zTzD8tbBcuB50sbin8jPsXhz-uVUEQs/edit?usp=sharing

Metadata

Metadata

Assignees

No one assigned

    Labels

    datasetUse this tag when providing a new dataset for inclusion in ConvoKit.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions