Documentation update with missing parameter info

yusuferentunc · yusuferentunc · commit 8589a62f4bff · 2025-06-18T19:12:49.000+02:00
diff --git a/docs/source/how-to-run.rst b/docs/source/how-to-run.rst
@@ -1,15 +1,15 @@
 How to Run DomHMM
 =================
 
-This section is about how to use DomHMM and how to elaborate on results.
+This section explains how to configure and run DomHMM with several different options.
 
 .. note::
-    In project's ``/example`` directory, you can find real life usage of DomHMM.
+    In the project's ``/example`` directory, you can find real-life examples of DomHMM.
 
 Running DomHMM
 --------------
 
-DomHMM's main class is ``PropertyCalculation``. In a basic example it is initialized as
+DomHMM's main class is ``PropertyCalculation``. In a basic example, it is initialized as
 
 .. code-block::
 
@@ -34,29 +34,29 @@ Main Parameters
 
 Let's dive into each parameter's details.
 
-* In initialization process, ``universe_or_atomgroup`` parameter stands for MDAnalysis universe. It contains your simulation's trajectory and tpr file. It can be created as
+* In initialization process, ``universe_or_atomgroup`` parameter stands for MDAnalysis universe. It contains your simulation's trajectory and topology file. It can be created as
 
 .. code-block::
 
     path2xtc = "YOUR_XTC_FILE.xtc"
     path2tpr = "YOUR_TPR_FILE.tpr"
     universe = mda.Universe(path2tpr, path2xtc)
 
-* ``leaflet_kwargs`` parameter stands for MDAnalysis ``LeafletFinder`` function's arguments. It is used to determine each leaflets residues. ``leaflet_kwargs`` requires head groups of lipids but not sterols.
+* ``leaflet_kwargs`` parameter stands for MDAnalysis ``LeafletFinder`` function's arguments. It is used to determine each leaflet's lipids. ``leaflet_kwargs`` requires head groups of lipids but not sterols.
 
 .. code-block::
 
     # An example where all lipids head group is PO4
     leaflet_kwargs={"select": "name PO4", "pbc": True}
 
-* ``membrane_select`` argument is for atom group selection of universe. It is useful for simulations that are contain non-membrane residues/molecules inside. If universe contains only membrane elements parameter can be leave in default option which is ``all``
+* ``membrane_select`` argument is for atom group selection of universe. It is useful for simulations that contain non-membrane residues/molecules inside. If the universe contains only membrane elements, the parameter can be left in the default option which is ``all``
 
 .. code-block::
 
     # An example where simulation contains DPPC and DIPC lipids, and CHOL sterol
     membrane_select = "resname DPPC DIPC CHOL"
 
-* ``leaflet_select`` argument is selection options for lipids which can be list of atom groups, list of string queries or automatically finding via LeafletFinder.
+* ``leaflet_select`` argument is a selection option for lipids which can be a list of atom groups, a list of string queries, or automatic via LeafletFinder.
 
 .. code-block::
 
@@ -69,31 +69,31 @@ Let's dive into each parameter's details.
     # Leave leaflet detection to DomHMM via LeafletFinder
     leaflet_select = "auto"
 
-* ``heads`` parameter requires lipids head groups. For atomistic simulations, head molecules' center atom can be entered.
+* ``heads`` parameter requires lipids head groups. For atomistic simulations, the head molecules' center atom can be entered.
 
 .. code-block::
 
     heads = {"DPPC": "PO4", "DIPC": "PO4"}
 
-* ``sterol_heads`` parameter requires sterol head groups. For atomistic simulations, head molecules' center atom can be entered.
+* ``sterol_heads`` parameter requires sterol head groups. For atomistic simulations, the head molecules' center atom can be entered.
 
 .. code-block::
 
-    # Martini Cholestrol example
+    # Martini Cholesterol example
     sterol_heads = {"CHOL": "ROH"}
-    # Atomistic Cholestrol example
+    # Atomistic Cholesterol example
     sterol_heads = {"CHL1": "O3"}
 
-* ``sterol_tails`` parameter requires sterol tail groups. It should be considered that each tail should be entered in same order for each lipids.
+* ``sterol_tails`` parameter requires sterol tail groups. It should be considered that each tail should be entered in the same order for each lipid.
 
 .. code-block::
 
-    # Martini Cholestrol example while ROH head as first element and C1 start of tail as second element
+    # Martini Cholesterol example while ROH head as the first element and C1 start of the tail the second element
     sterol_tails = {"CHOL": ["ROH", "C1"]}
-    # Atomistic Cholestrol example while O3 head as first element and C20 start of tail as second element
+    # Atomistic Cholesterol example while O3 head as first element and C20 start of tail as second element
     sterol_tails = {"CHL1": ["O3", "C20"]}
 
-* ``tails`` parameter requires lipids tail groups. It should be considered that each tail should be entered in same order for each lipids.
+* ``tails`` parameter requires lipids tail groups. It should be considered that each tail should be entered in the same order for each lipid.
 
 .. code-block::
 
@@ -102,65 +102,64 @@ Let's dive into each parameter's details.
                  "DIPC": [["C1B", "D2B", "D3B", "C4B"], ["C1A", "D2A", "D3A", "C4A"]]}
 
 
-* For run option, you can have ``start``, ``stop`` and ``step`` options. This options arrange which frame to start, stop. You can also set model to be trained for each *X* frame by setting ``step=X``.
+* For run option, you can have ``start``, ``stop`` and ``step`` options. These options arrange which frame to start or stop. You can also set the model to be trained for each *X* frame by setting ``step=X``.
 
 .. code-block::
 
-    # An example where DomHMM model training starts from 5th frame and ends in 1000th frame while taking each 5th step. First three frames will be 5th, 10th and 15th frames.
+    # An example where DomHMM model training starts from the 5th frame and ends in the 1000th frame while taking each 5th step. The first three frames will be the 5th, 10th, and 15th frames.
     model.run(start=5, stop=1000, step=5)
 
 .. warning::
-    If detailed post analysis will be conducted on result such as usage of ``Getis_Ord`` results, input order of lipids and sterols should be in same order as in simulation. If simulation lipids are in order of ``DPPC, DIPC, CHOL`` with respect to residue ids, keys of ``heads``, ``tails``, ``sterol_heads``, and ``sterol_tails`` should be in same order just like in this example.
+    If detailed post-analysis will be conducted on results such as usage of ``Getis_Ord`` results, the input order of lipids and sterols should be in same order as in simulation. If simulation lipids are in order of ``DPPC, DIPC, CHOL`` with respect to residue IDs, keys of ``heads``, ``tails``, ``sterol_heads``, and ``sterol_tails`` should be in the same order just like in this example.
 
 .. note::
 
-    Since DomHMM uses Gaussian Mixture Model and Gaussian-based Hidden Markov Model, it is suggested to not use too short or too long simulations. Short simulations may not create a sensible results and long one would be take too much time to train model. In our examples, we used simulations that contains around 2000 frames and model run is finished around 25-30 minutes.
+    Since DomHMM uses the Gaussian Mixture Model and Gaussian-based Hidden Markov Model, it is suggested to not use too short or too long simulations. Short simulations may not create sensible results and long ones would take too much time to train the model. In our examples, we used simulations that contains around 2000 frames, and the model run is finished around 25-30 minutes.
 
 Optional Parameters
 -------------------
 
-* ``do_clustering``
-
-Whether to perform the hierarchical clustering or not (Default is True).
-
 * ``asymmetric_membrane``
 
-It needs to be enabled if leaflets are not symmetric. With this option, models are fitted by separated data for each leaflets.
-
-* ``frac``
+It needs to be enabled if leaflets are not symmetric. With this option, models are fitted by separated data for each leaflet.
 
-Fraction of box length in x and y outside the unit cell considered for area per lipid calculation by Voronoi. It is an optimization process parameter which is set to 0.5 as default.
-
-* ``p_value``
+* ``do_clustering``
 
-Probability value that is used for z-score calculation. It is a determination percentage for domain identification with getis-ord statistic. In default, it is set to 0.05 or %5.
+Whether to perform the hierarchical clustering or not (Default is True).
 
 * ``result_plot``
 
-Plotting option for debugging. While enabled, DomHMM will print Hidden Markov model iterations result, prediction results, Getis-Ord statistic results and clustering result of three frame.
+Plotting option for debugging. While enabled, DomHMM will print Hidden Markov model convergence, prediction results, Getis-Ord statistic results, and clustering results of three frames.
 
 * ``save_plots``
 
 Option for saving result plots in pdf format.
 
 * ``verbose``
 
-Verbose option for debugging. Although, DomHMM doesn't print middle values, it shows which steps are done and shows middle step plots which may give clues about succession of model.
+Verbose option for debugging. It shows which steps are done in the analysis.
+
+* ``lipid_leaflet_rate``
 
+The frame rate for checking lipids leaflet assignments via LeafletFinder. In the default option, it is equal to 0 which means leaflet assignment is only done at the beginning of the analysis.
+
+* ``sterol_leaflet_rate``
+
+The frame rate for checking sterols leaflet assignments via LeafletFinder. In the default option, it is equal to 1 which means sterols leaflet assignment will be calculated in every time frame to capture flip-flops.
 
 * ``gmm_kwargs``
 
-Parameter option for Gaussian Mixture Model training. An example of it is
+Parameter option for Gaussian Mixture Model training. An example of this is
 
 .. code-block::
 
-    gmm_kwargs = {"tol": 1E-4, "init_params": 'k-means++', "verbose": 0,
+    gmm_kwargs = {"tol": 1E-4, "init_params": 'random_from_data', "verbose": 0,
                       "max_iter": 10000, "n_init": 20,
                       "warm_start": False, "covariance_type": "full"}
 
 * ``hmm_kwargs``
 
-Parameter option for Gaussian-based Hidden Markov Model training. An example of it is
+Parameter option for Gaussian-based Hidden Markov Model training. An example of this is
 
 .. code-block::
 
@@ -170,7 +169,7 @@ Parameter option for Gaussian-based Hidden Markov Model training. An example of
 
 * ``trained_hmms``
 
-Parameter option for reusing past DomHMM HMM models. If there are several analysis will be conducted with slightly difference membrane simulations or with different parameter options, first analysis HMM model can be reusable with this parameter.
+Parameter option for reusing past DomHMM HMM models. If there are several analyses that will be conducted with slightly different membrane simulations or with different parameter options, the first analysis HMM model can be reusable with this parameter.
 
 .. code-block::
 
@@ -183,17 +182,30 @@ Parameter option for reusing past DomHMM HMM models. If there are several analys
     model_2 = domhmm.PropertyCalculation( ... ,
                                          trained_hmms=reuse_hmm_models)
 
+* ``n_init_hmm``
+
+Number of repeats for HMM model training. HMM models can be trained multiple times to achieve better performance.
+
+* ``frac``
+
+The fraction of box length in x and y outside the unit cell is considered for area per lipid calculation by Voronoi. It is an optimization process parameter that is set to 0.5 as the default.
+
+* ``p_value``
+
+Probability value that is used for z-score calculation. It is a determination percentage for domain identification with the Getis-Ord statistic. In default, it is set to 0.05 or %5.
+
+
 * ``tmd_protein_list``
 
-Transmembrane domain (tmd) protein list to include area per lipid calculation. Since tmd proteins are take up space in upper, lower or both leaflets, three backbone atoms of protein for each leaflet should be included as in this parameter to increase success of identification.
+Transmembrane domain (TMD) protein list to include area per lipid calculation. TMD proteins take up space in the exoplasmic, cytoplasmic leaflets. Three backbone atoms of protein that are in close position to lipid head groups should be included in this parameter to increase the success of identification.
 
 .. code-block::
 
-    # Selecting three backbone atoms that is touching to upper leaflet
+    # Selecting three backbone atoms that are touching the exoplasmic leaflet
     upBB = uni.select_atoms('name BB')[0:3]
-    # Selecting three backbone atoms that is touching to lower leaflet
+    # Selecting three backbone atoms that are touching the endoplasmic leaflet
     loBB = uni.select_atoms('name BB')[-3:]
-    # List can be expended with multiple dictionary objects as in more than one tmd protein scenarios.
+    # List can be expended with multiple dictionary objects as in more than one TMD protein scenario.
     tmd_protein_list = [{"0": upBB, "1": loBB}]
 
-We encourage to check :doc:`tips` section that may contain useful information for your progress.
+We encourage you to check :doc:`tips` section which may contain useful information for your progress.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,4 +1,4 @@
-Welcome to DomHMM's documentation!
+Welcome to DomHMM documentation!
 =========================================================
 
 .. toctree::
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -30,7 +30,7 @@ For installation, you can directly use pip in project directory.
 Installation for Development
 ------------------------------
 
-This type of installation can be use when pip is not usable, change in source code or contributing DomHMM.
+This type of installation can be use when source code will be change for special usage or contribution will be done to DomHMM.
 
 Clone DomHMM's repository and change directory to project directory
 
diff --git a/docs/source/post-analysis.rst b/docs/source/post-analysis.rst
@@ -1,11 +1,11 @@
 Results and Post-Analysis
 ==========================
 
-After running of DomHMM, results are achievable via assigned variable which in this document named ``model``. Besides clustering results of ordered and disorder domains, training data that is used for Hidden Markov Model is also available which contains area per lipid calculation and Scc order parameters calculations for each lipid and sterol.
+After running DomHMM, results are achievable via an assigned variable which in this document is named ``model``. Besides clustering results of ordered and disorder domains, training data that is used for the Hidden Markov model is also available which contains area per lipid calculation and Scc order parameters calculations for each lipid and sterol.
 
 Domain Cluster Results
 -----------------------
-``Clustering`` is a Python dictionary which contains each frames residue indexes that are assigned to Lo ordered domains.
+``Clustering`` is a Python dictionary that contains each frame residue index that is assigned to lipid-ordered domains.
 
 ``Clustering`` is a dictionary with two keys ``"0"`` as representing upper leaflet and ``"1"`` as representing lower leaflet.
 
@@ -26,10 +26,10 @@ Domain Cluster Results
 Training Data (Area per lipid and order parameters)
 ---------------------------------------------------
 
-If required for post analysis, user can access area per lipid and order parameters calculations of each lipid. This data is kept objects result data which can be accessed via ``model.results["train_data_per_type"]``.
+If required for post-analysis, the user can access the area per lipid and order parameter calculations of each lipid. This data is kept objects result in data which can be accessed via ``model.results["train_data_per_type"]``.
 
-``train_data_per_type`` is a Python dictionary which contains lipid and sterol names are keys and three dimension arrays as values. In this three dimension array, each dimension contains residue ids, second dimension contains parameters and third dimension contains each frame's residue leaflet assignments.
-Be aware that both second and third arrays are in same order of residue ids from first array.
+``train_data_per_type`` is a Python dictionary that contains lipid names as keys and three rowed arrays as values. The first row contains residue IDs, the second training data, and the third each frame's residue leaflet assignments.
+Be aware that both the second and third arrays are in the same order of residue IDs from the first array.
 
 Here is an example of it.
 
@@ -45,25 +45,25 @@ Here is an example of it.
 
 .. note::
 
-    Each arrays are in ``numpy.array`` format.
+    Each array is in ``numpy.array`` format.
 
 .. note::
-    Parameters array (second array) is keep in order of ``[[apl_1, scc_1_1, scc_1_2],[apl_2, scc_2_1, scc_2_2], ...]``. (apl = Area per Lipid, scc__x= Scc Order Parameter of tail x )
+    Parameters array (second array) is kept in order of ``[[apl_1, scc_1_1, scc_1_2],[apl_2, scc_2_1, scc_2_2], ...]``. (apl = Area per Lipid, scc__x= Scc Order Parameter of tail x )
 
 .. note::
-    Leaflet assignment array (third array) is consists of 0s and 1s where 0 means upper leaflet and 1 means lower leaflet. Rows are represents residues which are in some order with residue ids from first array and columns are represents frames.
+    The leaflet assignment array (third array) consists of 0s and 1s where 0 means exoplasmic leaflet and 1 means endoplasmic leaflet. Rows represent residues which are in some order with residue IDs from the first array and columns represent frames.
 
 .. note::
-    Names of lipids and sterols are same names that user gave in tails and heads parameters.
+    Names of lipids and sterols are the same names that users gave in tails and heads parameters.
 
 
 Result Saving
 ---------------
-User can save and reload model's itself or required data via `pickle`_.
+Users can save and reload the model itself or required data via `pickle`_.
 
 .. code-block::
 
-    # Model's itself or required result sections can be save via pickle
+    # Model itself or result section can be saved via pickle
     with open('DomHMM_model.pickle', 'wb') as file:
         pickle.dump(model, file)
 
@@ -72,5 +72,7 @@ User can save and reload model's itself or required data via `pickle`_.
         loaded_module = pickle.load(file)
 
 
+.. note::
+    When loading the full model, the MDAnalysis universe will load the trajectory and topology file from the same directory that was given in the analysis run. Therefore, full-model saving can't be loaded if files do not exist.
 
 .. _pickle: https://www.mdanalysis.org/pages/mdakits/
diff --git a/docs/source/tips.rst b/docs/source/tips.rst
@@ -3,18 +3,11 @@ Tips for Usage
 
 This page contains useful tips that will improve your experience of DomHMM
 
-* Computation Time
-
-.. tip::
-    In our tests, a Martini molecular dynamics simulation with 2000 frames with 720 lipids took around 25 to 30 minutes.
-
 .. tip::
-    Sometimes Hidden Markov model training may stuck which is out of our control. If your program is taking long time with comparing to reference, you may consider restart it and enable `verbose` option.
+    In our tests, a Martini molecular dynamics simulation with 2000 frames with 720 lipids took around 25 to 30 minutes in Apple M2 chip.
 
 .. tip::
-    Simultaneously running more than one DomHMM analysis may cause deadlock due to core allocation logic of hmmlearn library.
-
-* Community Support
+    Simultaneously running more than one DomHMM analysis may cause deadlock due to core allocation logic of ``hmmlearn`` library.
 
 .. tip::
     DomHMM is a fresh open source project. If you face any problems or bugs, you can refer it in issue pages of project's repository. We are looking forward to improve our project and support our users.
diff --git a/docs/source/what-is-domhmm.rst b/docs/source/what-is-domhmm.rst

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-Welcome to DomHMM's documentation!`
	`1`	`+Welcome to DomHMM documentation!`
`2`	`2`	`=========================================================`
`3`	`3`
`4`	`4`	`.. toctree::`