Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 33 additions & 37 deletions src/diffpy/snmf/snmf_class.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@


class SNMFOptimizer:
"""A self-contained implementation of the stretched NMF algorithm (sNMF),
including sparse stretched NMF.

Instantiating the SNMFOptimizer class runs all the analysis immediately.
The results matrices can then be accessed as instance attributes
of the class (X, Y, and A).

For more information on sNMF, please reference:
Gu, R., Rakita, Y., Lan, L. et al. Stretched non-negative matrix factorization.
npj Comput Mater 10, 193 (2024). https://doi.org/10.1038/s41524-024-01377-5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would normally do a list of Class attributes here. Everything that is self.something. This is obviously strongly overlapped with the arguments of the constructor, as many of the attributes get defined in the constructor, but logically they are different. Here we list and dsecribe the class attributes, there we describe the init function arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on how I'd distinguish the arguments from the attributes. I understand how they are different semantically, but what part of that is necessary to make clear here? Can you give an example? Those have been helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything that is self.something (except for methods which are self.functions() which are not considered attributes) is an attribute. So MM, Y0, X0 are attributes, but also M, N, rng, num_updates etc.

Inside a function or method the parameters are the arguments of the function. so for the __init__() function they will be MM, Y0, X0, A, rho, eta and so on). Some of the descriptions will overlap but for the function argument the user needs to know if it is optional or not, what the default is, and anything else they need to know to successfully instantiate the class. People will generally not see the two docstrings at the same time, so there can be some repetition, but try and keep it short but informative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls confirm with a thumbs up if you saw this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added class attributes. Some of them are for sure redundant but a full cleanup I will save for the next PR.

"""

def __init__(
self,
MM,
Expand All @@ -17,48 +29,33 @@ def __init__(
n_components=None,
random_state=None,
):
"""Run sNMF based on an ndarray, parameters, and either a number
of components or a set of initial guess matrices.

Currently instantiating the SNMFOptimizer class runs all the analysis
immediately. The results can then be accessed as instance attributes
of the class (X, Y, and A). Eventually, this will be changed such
that __init__ only prepares for the optimization, which will can then
be done using fit_transform.
"""Initialize an instance of SNMF and run the optimization

Parameters
----------
MM: ndarray
A numpy array containing the data to be decomposed. Rows correspond
to different samples/angles, while columns correspond to different
conditions with different stretching. Currently, there is no option
to treat the first column (commonly containing 2theta angles, sample
index, etc) differently, so if present it must be stripped in advance.
The array containing the data to be decomposed. Shape is (length_of_signal,
number_of_conditions).
Y0: ndarray
A numpy array containing initial guesses for the component weights
at each stretching condition, with number of rows equal to the assumed
number of components and number of columns equal to the number of
conditions (same number of columns as MM). Must be provided if
n_components is not provided. Will override n_components if both are
provided.
The array containing initial guesses for the component weights
at each stretching condition. Shape is (number of components, number of
conditions) Must be provided if n_components is not provided. Will override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally we would raise an exception if two conflicting things are provided (we don't want to guess which is the right one) unless there is a good functional reason to do it another way. We like to avoid "magic" and the current behavior of the code could be "magic". Please raise an exception unless there is a strong reason to do otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I don't see any reason for them not to match, so now the user will only be allowed to provide one. This isn't what scikit-learn does, but per your suggestion it makes the most sense for now. The new logic will be: "first, check that exclusively one of n_components and Y0 is provided. If not, raise an exception. If n_components is provided, use that to generate a Y0 with the appropriate size."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if scikit-learn does it the "magic" way we may want to/have to conform to that. But for now, let's do it this way.

n_components if both are provided.
X0: ndarray
A numpy array containing initial guesses for the intensities of each
component per row/sample/angle. Has rows equal to the rows of MM and
columns equal to n_components or the number of rows of Y0.
The array containing initial guesses for the intensities of each component per
row/sample/angle. Shape is (length_of_signal, number_of_components).
A: ndarray
A numpy array containing initial guesses for the stretching factor for
each component, at each condition. Has number of rows equal to n_components
or the number of rows of Y0, and columns equal to the number of conditions
(columns of MM).
The array containing initial guesses for the stretching factor for each component,
at each condition. Shape is (number_of_components, number_of_conditions).
rho: float
A stretching factor that influences the decomposition. Zero corresponds to
no stretching present. Relatively insensitive and typically adjusted in
powers of 10.
The float which sets a stretching factor that influences the decomposition.
Zero corresponds to no stretching present. Relatively insensitive and typically
adjusted in powers of 10.
eta: float
A sparsity factor than influences the decomposition. Should be set to zero
for non sparse data such as PDF. Can be used to improve results for sparse
data such as XRD, but due to instability, should be used only after first
selecting the best value for rho.
The integer which sets a sparsity factor than influences the decomposition.
Should be set to zero for non sparse data such as PDF. Can be used to improve
results for sparse data such as XRD, but due to instability, should be used
only after first selecting the best value for rho.
max_iter: int
The maximum number of times to update each of A, X, and Y before stopping
the optimization.
Expand All @@ -71,10 +68,9 @@ def __init__(
be overridden by Y0 if that is provided, but must be provided if no Y0 is
provided.
random_state: int
Used to set a reproducible seed for the initial matrices used in the
optimization. Due to the non-convex nature of the problem, results may vary
even with the same initial guesses, so this does not make the program
deterministic.
The integer which acts as a reproducible seed for the initial matrices used in
the optimization. Due to the non-convex nature of the problem, results may vary
even with the same initial guesses, so this does not make the program deterministic.
"""

self.MM = MM
Expand Down