[Dataset] MeetingBank Dataset

# MeetingBank Corpus
A collection of transcribed city council meetings from 6 major U.S. cities, converted to ConvoKit format for conversational AI research and meeting summarization tasks. The data consists of 1,366 meetings with over 3,579 hours of video content, providing a rich dataset for studying political discourse, meeting dynamics, and automated summarization.

**Attribution:** Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu, "MeetingBank: A Benchmark Dataset for Meeting Summarization," in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), Toronto, Canada, July 2023, pp. 4512-4522. [Online]. Available: https://zenodo.org/records/7989108

## Dataset details

### Speaker-level information
Speakers in the dataset are participants in city council meetings, including council members, city officials, and public speakers. Each speaker is identified by a unique identifier that combines the meeting name with their speaker number (e.g., "SeattleCityCouncil_12142015_speaker_0").

Speaker metadata includes:
* city: The city where the meeting took place
* meeting_name: The specific meeting identifier
* utterance_count: Total number of utterances contributed by this speaker

### Utterance-level information
For each utterance (speech segment), we provide:
* id: An identifier for the utterance (comprised of the meeting ID concatenated with its index in the meeting)
* conversation_id: An identifier for the meeting/conversation to which the utterance belongs
* reply_to: ID of the previous utterance in the conversation (None if it's the first utterance)
* speaker: The speaker who delivered the utterance
* timestamp: Time offset of the utterance within the meeting (in microseconds)
* text: Transcribed textual content of the utterance

Utterance metadata:
* city: The city where the meeting took place
* meeting_name: The specific meeting identifier
* duration: Duration of the speech segment (in microseconds)

### Conversational-level information
Each conversation represents a complete city council meeting. The conversation structure follows a linear progression where each utterance replies to the previous one, creating a chronological chain of the meeting proceedings. Conversations are organized by city and meeting date, with each meeting containing multiple agenda items and discussion segments.

Meeting metadata includes:
* city: The city where the meeting took place
* meeting_name: The specific meeting identifier
* num_speakers: Total number of unique speakers in the meeting
* total_utterances: Total number of speech segments in the meeting
* total_duration: Total duration of the meeting (in microseconds)

### Quick stats
```
Number of conversations in the dataset = 1366
Number of speakers in the dataset = 12272
Number of utterances in the dataset = 1011870

=== CITY BREAKDOWN ===
Alameda: 164 transcripts
Boston: 32 transcripts
Denver: 401 transcripts
KingCounty: 132 transcripts
LongBeach: 310 transcripts
Seattle: 327 transcripts
```

### Contact
Please email any questions to: dv292@cornell.edu

### Dataset Link
https://drive.google.com/drive/u/1/folders/15OXtWuMj2GYBAeYGo1EzJlcIzSco6Z1Q

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dataset] MeetingBank Dataset #312

MeetingBank Corpus

Dataset details

Speaker-level information

Utterance-level information

Conversational-level information

Quick stats

Contact

Dataset Link

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Dataset] MeetingBank Dataset #312

Description

MeetingBank Corpus

Dataset details

Speaker-level information

Utterance-level information

Conversational-level information

Quick stats

Contact

Dataset Link

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions