Documentation update #26

valosekj · 2025-01-31T15:52:16Z

No description provided.

…ng newlines which simulated soft-wrap poorly)

…xes a slight error in the SampleNullityDrop hook, where it was evaluating checking the threshold against sample count (rather than feature count)

…c-strings

…currently only one: SimpleImputation)

…e is currently only one: StandardScaling)

…iKit-Learn's root ModelManager

…g example usage)

…he extended discussion of how this implementation works around SciKit-Learn's triple-variant approach!

…els. Oops

…ple usage. Woohoo

…r use in the broader context of the framework.

data/hooks/encoding.py

data/hooks/feature_selection.py

valosekj · 2025-01-31T16:07:51Z

data/hooks/feature_selection.py

 @registered_data_hook("sample_drop_null")
 class SampleNullityDrop(NullityDrop):
+    """
+    Data hook which will automatically remove samples in the dataset which contain more than some threshold amount of


Just if I understand it correctly: SampleNullityDrop drops rows (samples), while FeatureNullityDrop drops columns (features), right?

Correct; I plan on adding a small documentation clarifying what "feature" and "sample" mean in the context of this library.

Given this is confusing here though, I'm going to extend the docstrings of data hooks which directly refer to features/samples with their definition to avoid this.

data/hooks/imputation.py

README.md

valosekj · 2025-01-31T16:25:09Z

README.md

-   determine the hyperparameters to use.
-   * Configuration files denote a parameter as being "trial tunable" by placing a dictionary in the 
-   place of a constant; an example of this can be seen in the `penalty` parameter for the 
+   * If a target column is specified, it is split off the dataset at this point to isolate it from pre-processing (see below)


What do you think about adding some explanatory figure (e.g., from your slides)? As you might remember, it took me a while to understand the concepts of replicate, trial, and split.

Yup, this was next on the docket. Just looking into how to set up Sphinx w/ AutoDoc (so we're not locked to GitHub's wiki should they decide to become tosspots in the future).

valosekj

Thanks a lot for improving the documentation, @SomeoneInParticular! I left a few minor comments and suggestions.

Addendum suggested by valosekj Co-authored-by: Jan Valosek <[email protected]>

Added additional (common) parameter, as suggested by Jan Co-authored-by: Jan Valosek <[email protected]>

… step is documenting how to interpret it!

…bit nicer to navigate now

… the context of a tutorial

…config tutorial

…d from a MOOP analysis

…MOOP runs to one another via plotting

…oded as an "object" type if the database write is interrupted during a MOOPs run

…ry much WIP

… multi-run comparison

…e run using MOOP's results

…mentation

SomeoneInParticular · 2025-05-13T12:24:12Z

Going to merge this as is, as a number of other lab members are starting to look into using this tool for their own research. Further fixes can be managed in later PRs

kalum.ost added 17 commits January 27, 2025 14:53

Cleaned up README.md to be a bit more clean and clear (namely, removi…

fadd879

…ng newlines which simulated soft-wrap poorly)

Cleaned up README.md to be a bit more clean and clear (namely, removi…

081f799

…ng newlines which simulated soft-wrap poorly)

Merge branch 'master' into kjo/documentation

a8d2a3f

Added docstring for 'registered_data_hook'

e3ffbac

Updated docstrings for the ABCs for data hooks

b81bc6f

Updated the docstrings of data-encoding hooks

f2f1d59

Updated DocStrings for the 'feature_selection.py' data hooks. Also fi…

9e094d4

…xes a slight error in the SampleNullityDrop hook, where it was evaluating checking the threshold against sample count (rather than feature count)

[Minor] Corrected indentation of use-case to match the rest of the do…

07504f4

…c-strings

Added the docstring for the Imputation data hooks (of which there is …

d4d00e5

…currently only one: SimpleImputation)

Added the docstring for the Standardization data hooks (of which ther…

fa1c984

…e is currently only one: StandardScaling)

Added missing docstring for the "evaluate_param" function, used by Sc…

9a7ef4f

…iKit-Learn's root ModelManager

Updated docstrings for the SciKit-Learn Ensemble models (mostly addin…

0ae009d

…g example usage)

Updated docstring for the Linear models provided by this tool. Note t…

4a496c9

…he extended discussion of how this implementation works around SciKit-Learn's triple-variant approach!

Fixed incorrect indentation in the use-case examples for Ensemble mod…

8406ba2

…els. Oops

Extended the docstring of KNeighborsClassifierManager to include exam…

c58a613

…ple usage. Woohoo

Added example usage to the SVC docstring.

cd459ab

Extended the docstring of the tuning utility classes, to clarify thei…

c926333

…r use in the broader context of the framework.