Added selection filter for random speaker/conversation/utterance corpus functions #268

wasihussain914 · 2025-01-02T23:39:39Z

Description

Edited corpus functions for random speaker/conversation / utterance functions.
Added

Edited corpus functions to add random speaker, conversation, and utterance selection functionality. These functions enable random selection using reservoir sampling with optional filtering criteria (selector).

feature

Motivation and Context

This change was introduced to enhance the usability of the corpus by allowing users to select random speakers, conversations, and utterances with customizable filtering conditions. It improves functionality and aligns with existing corpus methods like iter_utterances.

How has this been tested?

Imported convokit locally in machine to test new functions. Tested all three functions -- speaker, utterance and conversation.

Other information

…on, random_utterance to add filtering

…990/ConvoKit into feature/randomSelection

leojqian

Summary
This solution adds the optional selector parameters to “random_speaker”, “random_conversation”, and “random_utterance” methods using standard reservoir sampling algorithm. After review and testing on both simulated and real reddit corpus, the core logic is correct and backward-compatible.

Testing
created a synthetic corpus as well as a subreddit corpus (r/Cornell, 7568 speakers, 74467 utterances, 10744 conversations) to test on and wrote comprehensive full test suite incl:

backward compatibility
multiple match selector
single-match selectors
no-match selectors
All tests passed on both scenarios.

Correctness
Reservoir sampling is implemented correctly by the user, this results in a uniform selection from filtered items that match the selector. Behavior for selectors, empty selections, and default behavior all match the expectations with the solution including cases without selector and cases with no matches.

Suggestions
Documentation is clear and understandable, only suggestion would be a grammatical correction on line 468: “takes an Speaker” -> “takes a speaker”

Conclusion
Following review and testing this solution should be approved for merging.

cristiandnm · 2025-12-04T13:46:46Z

@leojqian can you make the grammatical correction, and then @seanzhangkx8 can merge

seanzhangkx8 · 2025-12-04T16:06:17Z

Suggestions Documentation is clear and understandable, only suggestion would be a grammatical correction on line 468: “takes an Speaker” -> “takes a speaker”

Great find! I guess this one comes from random_speaker. In that case, the phrase is probably referring to “the function takes a Speaker object as input,” so we do want to keep the capitalization. But you’re absolutely right that it should be “takes a Speaker”

leojqian · 2025-12-04T18:50:27Z

Suggestions Documentation is clear and understandable, only suggestion would be a grammatical correction on line 468: “takes an Speaker” -> “takes a speaker”

Great find! I guess this one comes from random_speaker. In that case, the phrase is probably referring to “the function takes a Speaker object as input,” so we do want to keep the capitalization. But you’re absolutely right that it should be “takes a Speaker”

I just pushed a version of the file with the corrections. Should be good to go.

seanzhangkx8 · 2025-12-04T18:53:05Z

you need to push to this branch so the change is reflected in this PR.

leojqian · 2025-12-04T18:56:34Z

you need to push to this branch so the change is reflected in this PR.

Hey Sean, it says im on the pr-268 branch when I pushed, is there something I'm missing?

seanzhangkx8 · 2025-12-04T18:59:00Z

Hey Leo, your changes doesn't seem to appear in this PR right now, can you check the commit history?

leojqian · 2025-12-04T19:06:14Z

Hey Leo, your changes doesn't seem to appear in this PR right now, can you check the commit history?

I dont think I have permission to push to his branch

Updating this branch.

wasihussain914 and others added 15 commits January 2, 2025 16:37

Edited functions of corpus module-- random speaker, random_conversati…

65d5986

…on, random_utterance to add filtering

run formatter

946eb8b

rerun formatter

5510adf

Added check in random object for no selector

27eaac5

Merge branch 'feature/randomSelection' of https://github.com/soulking…

b4baa3d

…990/ConvoKit into feature/randomSelection

Black formatting

97abf9a

update random obj functions

05c43db

rerun formatter

4a46429

Edited docs for itterance functions.

a877d4b

Merge branch 'feature/randomSelection' of https://github.com/soulking…

1dc6384

…990/ConvoKit into feature/randomSelection

Final edit of docs after running black

a351a5e

final commit with black run

a64a5e5

increment version number to 3.4.0

d299551

Merge branch 'CornellNLP:master' into master

39f228d

Merge branch 'master' of https://github.com/seanzhangkx8/ConvoKit

733abf7

leojqian self-requested a review December 4, 2025 00:53

leojqian reviewed Dec 4, 2025

View reviewed changes

seanzhangkx8 added 4 commits December 4, 2025 14:18

fix typo

df30cde

Merge remote-tracking branch 'origin/master' into pr/268

058bb14

Updating this branch.

Merge branch 'master' into feature/randomSelection

9520994

clean up setup and workflow

d0b923b

seanzhangkx8 merged commit fedb7a5 into CornellNLP:master Dec 4, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added selection filter for random speaker/conversation/utterance corpus functions #268

Added selection filter for random speaker/conversation/utterance corpus functions #268

Uh oh!

wasihussain914 commented Jan 2, 2025

Uh oh!

leojqian left a comment

Uh oh!

cristiandnm commented Dec 4, 2025

Uh oh!

seanzhangkx8 commented Dec 4, 2025

Uh oh!

leojqian commented Dec 4, 2025

Uh oh!

seanzhangkx8 commented Dec 4, 2025

Uh oh!

leojqian commented Dec 4, 2025

Uh oh!

seanzhangkx8 commented Dec 4, 2025

Uh oh!

leojqian commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Added selection filter for random speaker/conversation/utterance corpus functions #268

Added selection filter for random speaker/conversation/utterance corpus functions #268

Uh oh!

Conversation

wasihussain914 commented Jan 2, 2025

Description

Motivation and Context

How has this been tested?

Other information

Uh oh!

leojqian left a comment

Choose a reason for hiding this comment

Uh oh!

cristiandnm commented Dec 4, 2025

Uh oh!

seanzhangkx8 commented Dec 4, 2025

Uh oh!

leojqian commented Dec 4, 2025

Uh oh!

seanzhangkx8 commented Dec 4, 2025

Uh oh!

leojqian commented Dec 4, 2025

Uh oh!

seanzhangkx8 commented Dec 4, 2025

Uh oh!

leojqian commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants