Skip to content

[Dataset] MeetingBank Dataset #312

@ddeepak95

Description

@ddeepak95

MeetingBank Corpus

A collection of transcribed city council meetings from 6 major U.S. cities, converted to ConvoKit format for conversational AI research and meeting summarization tasks. The data consists of 1,366 meetings with over 3,579 hours of video content, providing a rich dataset for studying political discourse, meeting dynamics, and automated summarization.

Attribution: Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu, "MeetingBank: A Benchmark Dataset for Meeting Summarization," in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), Toronto, Canada, July 2023, pp. 4512-4522. [Online]. Available: https://zenodo.org/records/7989108

Dataset details

Speaker-level information

Speakers in the dataset are participants in city council meetings, including council members, city officials, and public speakers. Each speaker is identified by a unique identifier that combines the meeting name with their speaker number (e.g., "SeattleCityCouncil_12142015_speaker_0").

Speaker metadata includes:

  • city: The city where the meeting took place
  • meeting_name: The specific meeting identifier
  • utterance_count: Total number of utterances contributed by this speaker

Utterance-level information

For each utterance (speech segment), we provide:

  • id: An identifier for the utterance (comprised of the meeting ID concatenated with its index in the meeting)
  • conversation_id: An identifier for the meeting/conversation to which the utterance belongs
  • reply_to: ID of the previous utterance in the conversation (None if it's the first utterance)
  • speaker: The speaker who delivered the utterance
  • timestamp: Time offset of the utterance within the meeting (in microseconds)
  • text: Transcribed textual content of the utterance

Utterance metadata:

  • city: The city where the meeting took place
  • meeting_name: The specific meeting identifier
  • duration: Duration of the speech segment (in microseconds)

Conversational-level information

Each conversation represents a complete city council meeting. The conversation structure follows a linear progression where each utterance replies to the previous one, creating a chronological chain of the meeting proceedings. Conversations are organized by city and meeting date, with each meeting containing multiple agenda items and discussion segments.

Meeting metadata includes:

  • city: The city where the meeting took place
  • meeting_name: The specific meeting identifier
  • num_speakers: Total number of unique speakers in the meeting
  • total_utterances: Total number of speech segments in the meeting
  • total_duration: Total duration of the meeting (in microseconds)

Quick stats

Number of conversations in the dataset = 1366
Number of speakers in the dataset = 12272
Number of utterances in the dataset = 1011870

=== CITY BREAKDOWN ===
Alameda: 164 transcripts
Boston: 32 transcripts
Denver: 401 transcripts
KingCounty: 132 transcripts
LongBeach: 310 transcripts
Seattle: 327 transcripts

Contact

Please email any questions to: [email protected]

Dataset Link

https://drive.google.com/drive/u/1/folders/15OXtWuMj2GYBAeYGo1EzJlcIzSco6Z1Q

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions