[Dataset] Ubuntu Chat Logs

# Ubuntu Chat Logs Misalignment Corpus
The conversations feature pairs of speakers where 1 speaker is assisting the other through Ubuntu chat logs to help them solve their problem. Human annotated friction points are included, along with friction points identified by GPT4o, GPT4omini, Llama 70B, and Llama 8B.

Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study with Ubuntu Chat Logs. Rupak Sarkar, Neha Srikanth, Taylor Hudson, Rachel Rudinger, Claire Bonial, Philip Resnik. arXiv preprint arXiv:2503.12370 (2025)

## Dataset Details
### Speaker-level information
Speakers in this dataset are troubleshooting problems together. Usually one person has a problem and another is helping them. Speakers are always in pairs. Role A denotes the person seeking assistance. As speakers can take part in multiple conversations, we track the following metadata:
- role_A_count: number of conversations where the speaker served in role A
- role_B_count: number of conversations where the speaker served in role B
### Utterance-level Information
The following data is provided:

- id: unique id of the utterance
- speaker: the speaker who authored the utterance
- conversation_id: unique id of the conversation
- reply_to: index of the utterance to which this is a reply to (None if the utterance is not a reply)
- timestamp: the index of the utterance in the conversation
- text: textual content of the utterance
Metadata for utterances include:
- time_elapsed: number of minutes elapsed since the start of the conversation
- gpt_explanation: an explanation of the utterance, generated by ChatGPT
- conversational_friction: conversational friction scores, generated by the original authors of the paper
- explanation: human-generated explanation of the utterance
### Conversational-level Information
For each conversation we provide:
- id: an unique index of the conversation
Metadata for conversations include:
- batch: the batch in which the conversation is sorted into
- duration: number of minutes elapsed since the start of the conversation
- role_A: speaker id for the one serving in role A for this conversation
- role_B: speaker id for the one serving in role B for this conversation
- ending: type of ending the conversation had (natural end, abrupt, or ran out of time)
- conversational_success: success of conversation in resolving question from role A speaker (success, some progress, or no progress)
For the human annotators and for each model, the following metadata is provided:
- conversational_friction_present_[model]: whether friction is detected anywhere in the conversation by [model]
- friction_count_[model]: number of instances of conversational friction detected by [model]
- friction_index_list_[model]: list of instances of conversational friction within this conversation detected by [model]
- explanation_list_[model]: list of explanations for each friction instance generated by [model]


## Basic Stats: ubuntu-chat-logs
- Number of utterances: 7950
- Number of conversations: 200
- Number of speakers: 361

## Contact
ConvoKit formatted corpus was created by Axel Bax (adb333@cornell.edu) from the dataset created by Sarkar et al.
Corresponding Author: Rupak Sarkar (rupak@umd.edu)

## Data Access

Find the zipped ConvoKit-formatted corpus here: [ubuntu-chat-logs.zip](https://github.com/user-attachments/files/22937522/ubuntu-chat-logs.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dataset] Ubuntu Chat Logs #309

Ubuntu Chat Logs Misalignment Corpus

Dataset Details

Speaker-level information

Utterance-level Information

Conversational-level Information

Basic Stats: ubuntu-chat-logs

Contact

Data Access

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Dataset] Ubuntu Chat Logs #309

Description

Ubuntu Chat Logs Misalignment Corpus

Dataset Details

Speaker-level information

Utterance-level Information

Conversational-level Information

Basic Stats: ubuntu-chat-logs

Contact

Data Access

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions