-
Notifications
You must be signed in to change notification settings - Fork 308
Dataset creation for backout commits #4159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @benjaminmah! Please see my comments. Also, please fix the linting errors (you may want to consider installing pre-commit
1).
Footnotes
scripts/backout_data_collection.py
Outdated
def main(): | ||
download_databases() | ||
|
||
commit_dict, bug_to_commit_dict, bug_dict = preprocess_commits_and_bugs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to consider the space complexity when iterating over the whole dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed unused keys when constructing the dictionaries and implemented a cache to use generated dictionaries from previous instances of running the code via saving them as JSON files. Let me know if this needs additional changes/fixes!
…gzilla.get_bugs`, removed a few tqdm lines
… found, and number of commits with multiple non backed out commits following it
… out commits is <= 2
…he dataset, separated by filename and split into `added_lines` and `removed_line`.
…he fix commit to extract the exact fix.
…t 2 years. Added batch file writing to reduce memory load.
…iff from fix commit
Example diffs extracted: |
Script to generate dataset of bug-inducing commits, backout commits, and the subsequent fix commit.
Intended to include:
pushdate
,desc
).