Skip to content

Conversation

@valosekj
Copy link
Member

No description provided.

kalum.ost added 17 commits January 27, 2025 14:53
…ng newlines which simulated soft-wrap poorly)
…ng newlines which simulated soft-wrap poorly)
…xes a slight error in the SampleNullityDrop hook, where it was evaluating checking the threshold against sample count (rather than feature count)
…he extended discussion of how this implementation works around SciKit-Learn's triple-variant approach!
…r use in the broader context of the framework.
@registered_data_hook("sample_drop_null")
class SampleNullityDrop(NullityDrop):
"""
Data hook which will automatically remove samples in the dataset which contain more than some threshold amount of
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just if I understand it correctly: SampleNullityDrop drops rows (samples), while FeatureNullityDrop drops columns (features), right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct; I plan on adding a small documentation clarifying what "feature" and "sample" mean in the context of this library.

Given this is confusing here though, I'm going to extend the docstrings of data hooks which directly refer to features/samples with their definition to avoid this.

determine the hyperparameters to use.
* Configuration files denote a parameter as being "trial tunable" by placing a dictionary in the
place of a constant; an example of this can be seen in the `penalty` parameter for the
* If a target column is specified, it is split off the dataset at this point to isolate it from pre-processing (see below)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding some explanatory figure (e.g., from your slides)? As you might remember, it took me a while to understand the concepts of replicate, trial, and split.

Copy link
Collaborator

@SomeoneInParticular SomeoneInParticular Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this was next on the docket. Just looking into how to set up Sphinx w/ AutoDoc (so we're not locked to GitHub's wiki should they decide to become tosspots in the future).

Copy link
Member Author

@valosekj valosekj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for improving the documentation, @SomeoneInParticular! I left a few minor comments and suggestions.

@valosekj valosekj added the documentation Improvements or additions to documentation label Jan 31, 2025
SomeoneInParticular and others added 2 commits January 31, 2025 12:12
Addendum suggested by valosekj

Co-authored-by: Jan Valosek <[email protected]>
Added additional (common) parameter, as suggested by Jan

Co-authored-by: Jan Valosek <[email protected]>
@SomeoneInParticular SomeoneInParticular mentioned this pull request Feb 12, 2025
9 tasks
kalum.ost and others added 23 commits February 12, 2025 16:10
…oded as an "object" type if the database write is interrupted during a MOOPs run
@SomeoneInParticular SomeoneInParticular marked this pull request as ready for review April 20, 2025 20:22
@SomeoneInParticular
Copy link
Collaborator

Going to merge this as is, as a number of other lab members are starting to look into using this tool for their own research. Further fixes can be managed in later PRs

@SomeoneInParticular SomeoneInParticular merged commit b103489 into master May 13, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants