-
Notifications
You must be signed in to change notification settings - Fork 137
Added selection filter for random speaker/conversation/utterance corpus functions #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added selection filter for random speaker/conversation/utterance corpus functions #268
Conversation
…on, random_utterance to add filtering
…990/ConvoKit into feature/randomSelection
…990/ConvoKit into feature/randomSelection
leojqian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
This solution adds the optional selector parameters to “random_speaker”, “random_conversation”, and “random_utterance” methods using standard reservoir sampling algorithm. After review and testing on both simulated and real reddit corpus, the core logic is correct and backward-compatible.
Testing
created a synthetic corpus as well as a subreddit corpus (r/Cornell, 7568 speakers, 74467 utterances, 10744 conversations) to test on and wrote comprehensive full test suite incl:
- backward compatibility
- multiple match selector
- single-match selectors
- no-match selectors
All tests passed on both scenarios.
Correctness
Reservoir sampling is implemented correctly by the user, this results in a uniform selection from filtered items that match the selector. Behavior for selectors, empty selections, and default behavior all match the expectations with the solution including cases without selector and cases with no matches.
Suggestions
Documentation is clear and understandable, only suggestion would be a grammatical correction on line 468: “takes an Speaker” -> “takes a speaker”
Conclusion
Following review and testing this solution should be approved for merging.
|
@leojqian can you make the grammatical correction, and then @seanzhangkx8 can merge |
Great find! I guess this one comes from random_speaker. In that case, the phrase is probably referring to “the function takes a Speaker object as input,” so we do want to keep the capitalization. But you’re absolutely right that it should be “takes a Speaker” |
I just pushed a version of the file with the corrections. Should be good to go. |
|
you need to push to this branch so the change is reflected in this PR. |
Hey Sean, it says im on the pr-268 branch when I pushed, is there something I'm missing? |
|
Hey Leo, your changes doesn't seem to appear in this PR right now, can you check the commit history? |
I dont think I have permission to push to his branch |
Description
Edited corpus functions for random speaker/conversation / utterance functions.
Added
Edited corpus functions to add random speaker, conversation, and utterance selection functionality. These functions enable random selection using reservoir sampling with optional filtering criteria (selector).
feature
Motivation and Context
This change was introduced to enhance the usability of the corpus by allowing users to select random speakers, conversations, and utterances with customizable filtering conditions. It improves functionality and aligns with existing corpus methods like iter_utterances.
How has this been tested?
Imported convokit locally in machine to test new functions. Tested all three functions -- speaker, utterance and conversation.
Other information