Skip to content

Conversation

@wasihussain914
Copy link
Contributor

Description

Edited corpus functions for random speaker/conversation / utterance functions.
Added

Edited corpus functions to add random speaker, conversation, and utterance selection functionality. These functions enable random selection using reservoir sampling with optional filtering criteria (selector).

feature

Motivation and Context

This change was introduced to enhance the usability of the corpus by allowing users to select random speakers, conversations, and utterances with customizable filtering conditions. It improves functionality and aligns with existing corpus methods like iter_utterances.

How has this been tested?

Imported convokit locally in machine to test new functions. Tested all three functions -- speaker, utterance and conversation.

Other information

@leojqian leojqian self-requested a review December 4, 2025 00:53
Copy link
Collaborator

@leojqian leojqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary
This solution adds the optional selector parameters to “random_speaker”, “random_conversation”, and “random_utterance” methods using standard reservoir sampling algorithm. After review and testing on both simulated and real reddit corpus, the core logic is correct and backward-compatible.

Testing
created a synthetic corpus as well as a subreddit corpus (r/Cornell, 7568 speakers, 74467 utterances, 10744 conversations) to test on and wrote comprehensive full test suite incl:

  • backward compatibility
  • multiple match selector
  • single-match selectors
  • no-match selectors
    All tests passed on both scenarios.

Correctness
Reservoir sampling is implemented correctly by the user, this results in a uniform selection from filtered items that match the selector. Behavior for selectors, empty selections, and default behavior all match the expectations with the solution including cases without selector and cases with no matches.

Suggestions
Documentation is clear and understandable, only suggestion would be a grammatical correction on line 468: “takes an Speaker” -> “takes a speaker”

Conclusion
Following review and testing this solution should be approved for merging.

@cristiandnm
Copy link
Contributor

@leojqian can you make the grammatical correction, and then @seanzhangkx8 can merge

@seanzhangkx8
Copy link
Collaborator

Suggestions Documentation is clear and understandable, only suggestion would be a grammatical correction on line 468: “takes an Speaker” -> “takes a speaker”

Great find! I guess this one comes from random_speaker. In that case, the phrase is probably referring to “the function takes a Speaker object as input,” so we do want to keep the capitalization. But you’re absolutely right that it should be “takes a Speaker”

@leojqian
Copy link
Collaborator

leojqian commented Dec 4, 2025

Suggestions Documentation is clear and understandable, only suggestion would be a grammatical correction on line 468: “takes an Speaker” -> “takes a speaker”

Great find! I guess this one comes from random_speaker. In that case, the phrase is probably referring to “the function takes a Speaker object as input,” so we do want to keep the capitalization. But you’re absolutely right that it should be “takes a Speaker”

I just pushed a version of the file with the corrections. Should be good to go.

@seanzhangkx8
Copy link
Collaborator

you need to push to this branch so the change is reflected in this PR.

@leojqian
Copy link
Collaborator

leojqian commented Dec 4, 2025

you need to push to this branch so the change is reflected in this PR.

Hey Sean, it says im on the pr-268 branch when I pushed, is there something I'm missing?

@seanzhangkx8
Copy link
Collaborator

Hey Leo, your changes doesn't seem to appear in this PR right now, can you check the commit history?

@leojqian
Copy link
Collaborator

leojqian commented Dec 4, 2025

Hey Leo, your changes doesn't seem to appear in this PR right now, can you check the commit history?

I dont think I have permission to push to his branch

@seanzhangkx8 seanzhangkx8 merged commit fedb7a5 into CornellNLP:master Dec 4, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants