-
Notifications
You must be signed in to change notification settings - Fork 0
Documentation update #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ng newlines which simulated soft-wrap poorly)
…ng newlines which simulated soft-wrap poorly)
…xes a slight error in the SampleNullityDrop hook, where it was evaluating checking the threshold against sample count (rather than feature count)
…currently only one: SimpleImputation)
…e is currently only one: StandardScaling)
…iKit-Learn's root ModelManager
…he extended discussion of how this implementation works around SciKit-Learn's triple-variant approach!
…ple usage. Woohoo
…r use in the broader context of the framework.
| @registered_data_hook("sample_drop_null") | ||
| class SampleNullityDrop(NullityDrop): | ||
| """ | ||
| Data hook which will automatically remove samples in the dataset which contain more than some threshold amount of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just if I understand it correctly: SampleNullityDrop drops rows (samples), while FeatureNullityDrop drops columns (features), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct; I plan on adding a small documentation clarifying what "feature" and "sample" mean in the context of this library.
Given this is confusing here though, I'm going to extend the docstrings of data hooks which directly refer to features/samples with their definition to avoid this.
| determine the hyperparameters to use. | ||
| * Configuration files denote a parameter as being "trial tunable" by placing a dictionary in the | ||
| place of a constant; an example of this can be seen in the `penalty` parameter for the | ||
| * If a target column is specified, it is split off the dataset at this point to isolate it from pre-processing (see below) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding some explanatory figure (e.g., from your slides)? As you might remember, it took me a while to understand the concepts of replicate, trial, and split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this was next on the docket. Just looking into how to set up Sphinx w/ AutoDoc (so we're not locked to GitHub's wiki should they decide to become tosspots in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for improving the documentation, @SomeoneInParticular! I left a few minor comments and suggestions.
Addendum suggested by valosekj Co-authored-by: Jan Valosek <[email protected]>
Added additional (common) parameter, as suggested by Jan Co-authored-by: Jan Valosek <[email protected]>
… step is documenting how to interpret it!
…bit nicer to navigate now
… the context of a tutorial
…d from a MOOP analysis
…MOOP runs to one another via plotting
…oded as an "object" type if the database write is interrupted during a MOOPs run
… multi-run comparison
…e run using MOOP's results
|
Going to merge this as is, as a number of other lab members are starting to look into using this tool for their own research. Further fixes can be managed in later PRs |
No description provided.