Skip to content

Commit 8589a62

Browse files
committed
Documentation update with missing parameter info
1 parent 59a19aa commit 8589a62

File tree

6 files changed

+73
-66
lines changed

6 files changed

+73
-66
lines changed

docs/source/how-to-run.rst

Lines changed: 53 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
How to Run DomHMM
22
=================
33

4-
This section is about how to use DomHMM and how to elaborate on results.
4+
This section explains how to configure and run DomHMM with several different options.
55

66
.. note::
7-
In project's ``/example`` directory, you can find real life usage of DomHMM.
7+
In the project's ``/example`` directory, you can find real-life examples of DomHMM.
88

99
Running DomHMM
1010
--------------
1111

12-
DomHMM's main class is ``PropertyCalculation``. In a basic example it is initialized as
12+
DomHMM's main class is ``PropertyCalculation``. In a basic example, it is initialized as
1313

1414
.. code-block::
1515
@@ -34,29 +34,29 @@ Main Parameters
3434

3535
Let's dive into each parameter's details.
3636

37-
* In initialization process, ``universe_or_atomgroup`` parameter stands for MDAnalysis universe. It contains your simulation's trajectory and tpr file. It can be created as
37+
* In initialization process, ``universe_or_atomgroup`` parameter stands for MDAnalysis universe. It contains your simulation's trajectory and topology file. It can be created as
3838

3939
.. code-block::
4040
4141
path2xtc = "YOUR_XTC_FILE.xtc"
4242
path2tpr = "YOUR_TPR_FILE.tpr"
4343
universe = mda.Universe(path2tpr, path2xtc)
4444
45-
* ``leaflet_kwargs`` parameter stands for MDAnalysis ``LeafletFinder`` function's arguments. It is used to determine each leaflets residues. ``leaflet_kwargs`` requires head groups of lipids but not sterols.
45+
* ``leaflet_kwargs`` parameter stands for MDAnalysis ``LeafletFinder`` function's arguments. It is used to determine each leaflet's lipids. ``leaflet_kwargs`` requires head groups of lipids but not sterols.
4646

4747
.. code-block::
4848
4949
# An example where all lipids head group is PO4
5050
leaflet_kwargs={"select": "name PO4", "pbc": True}
5151
52-
* ``membrane_select`` argument is for atom group selection of universe. It is useful for simulations that are contain non-membrane residues/molecules inside. If universe contains only membrane elements parameter can be leave in default option which is ``all``
52+
* ``membrane_select`` argument is for atom group selection of universe. It is useful for simulations that contain non-membrane residues/molecules inside. If the universe contains only membrane elements, the parameter can be left in the default option which is ``all``
5353

5454
.. code-block::
5555
5656
# An example where simulation contains DPPC and DIPC lipids, and CHOL sterol
5757
membrane_select = "resname DPPC DIPC CHOL"
5858
59-
* ``leaflet_select`` argument is selection options for lipids which can be list of atom groups, list of string queries or automatically finding via LeafletFinder.
59+
* ``leaflet_select`` argument is a selection option for lipids which can be a list of atom groups, a list of string queries, or automatic via LeafletFinder.
6060

6161
.. code-block::
6262
@@ -69,31 +69,31 @@ Let's dive into each parameter's details.
6969
# Leave leaflet detection to DomHMM via LeafletFinder
7070
leaflet_select = "auto"
7171
72-
* ``heads`` parameter requires lipids head groups. For atomistic simulations, head molecules' center atom can be entered.
72+
* ``heads`` parameter requires lipids head groups. For atomistic simulations, the head molecules' center atom can be entered.
7373

7474
.. code-block::
7575
7676
heads = {"DPPC": "PO4", "DIPC": "PO4"}
7777
78-
* ``sterol_heads`` parameter requires sterol head groups. For atomistic simulations, head molecules' center atom can be entered.
78+
* ``sterol_heads`` parameter requires sterol head groups. For atomistic simulations, the head molecules' center atom can be entered.
7979

8080
.. code-block::
8181
82-
# Martini Cholestrol example
82+
# Martini Cholesterol example
8383
sterol_heads = {"CHOL": "ROH"}
84-
# Atomistic Cholestrol example
84+
# Atomistic Cholesterol example
8585
sterol_heads = {"CHL1": "O3"}
8686
87-
* ``sterol_tails`` parameter requires sterol tail groups. It should be considered that each tail should be entered in same order for each lipids.
87+
* ``sterol_tails`` parameter requires sterol tail groups. It should be considered that each tail should be entered in the same order for each lipid.
8888

8989
.. code-block::
9090
91-
# Martini Cholestrol example while ROH head as first element and C1 start of tail as second element
91+
# Martini Cholesterol example while ROH head as the first element and C1 start of the tail the second element
9292
sterol_tails = {"CHOL": ["ROH", "C1"]}
93-
# Atomistic Cholestrol example while O3 head as first element and C20 start of tail as second element
93+
# Atomistic Cholesterol example while O3 head as first element and C20 start of tail as second element
9494
sterol_tails = {"CHL1": ["O3", "C20"]}
9595
96-
* ``tails`` parameter requires lipids tail groups. It should be considered that each tail should be entered in same order for each lipids.
96+
* ``tails`` parameter requires lipids tail groups. It should be considered that each tail should be entered in the same order for each lipid.
9797

9898
.. code-block::
9999
@@ -102,65 +102,64 @@ Let's dive into each parameter's details.
102102
"DIPC": [["C1B", "D2B", "D3B", "C4B"], ["C1A", "D2A", "D3A", "C4A"]]}
103103
104104
105-
* For run option, you can have ``start``, ``stop`` and ``step`` options. This options arrange which frame to start, stop. You can also set model to be trained for each *X* frame by setting ``step=X``.
105+
* For run option, you can have ``start``, ``stop`` and ``step`` options. These options arrange which frame to start or stop. You can also set the model to be trained for each *X* frame by setting ``step=X``.
106106

107107
.. code-block::
108108
109-
# An example where DomHMM model training starts from 5th frame and ends in 1000th frame while taking each 5th step. First three frames will be 5th, 10th and 15th frames.
109+
# An example where DomHMM model training starts from the 5th frame and ends in the 1000th frame while taking each 5th step. The first three frames will be the 5th, 10th, and 15th frames.
110110
model.run(start=5, stop=1000, step=5)
111111
112112
.. warning::
113-
If detailed post analysis will be conducted on result such as usage of ``Getis_Ord`` results, input order of lipids and sterols should be in same order as in simulation. If simulation lipids are in order of ``DPPC, DIPC, CHOL`` with respect to residue ids, keys of ``heads``, ``tails``, ``sterol_heads``, and ``sterol_tails`` should be in same order just like in this example.
113+
If detailed post-analysis will be conducted on results such as usage of ``Getis_Ord`` results, the input order of lipids and sterols should be in same order as in simulation. If simulation lipids are in order of ``DPPC, DIPC, CHOL`` with respect to residue IDs, keys of ``heads``, ``tails``, ``sterol_heads``, and ``sterol_tails`` should be in the same order just like in this example.
114114

115115
.. note::
116116

117-
Since DomHMM uses Gaussian Mixture Model and Gaussian-based Hidden Markov Model, it is suggested to not use too short or too long simulations. Short simulations may not create a sensible results and long one would be take too much time to train model. In our examples, we used simulations that contains around 2000 frames and model run is finished around 25-30 minutes.
117+
Since DomHMM uses the Gaussian Mixture Model and Gaussian-based Hidden Markov Model, it is suggested to not use too short or too long simulations. Short simulations may not create sensible results and long ones would take too much time to train the model. In our examples, we used simulations that contains around 2000 frames, and the model run is finished around 25-30 minutes.
118118

119119
Optional Parameters
120120
-------------------
121121

122-
* ``do_clustering``
123-
124-
Whether to perform the hierarchical clustering or not (Default is True).
125-
126122
* ``asymmetric_membrane``
127123

128-
It needs to be enabled if leaflets are not symmetric. With this option, models are fitted by separated data for each leaflets.
129-
130-
* ``frac``
124+
It needs to be enabled if leaflets are not symmetric. With this option, models are fitted by separated data for each leaflet.
131125

132-
Fraction of box length in x and y outside the unit cell considered for area per lipid calculation by Voronoi. It is an optimization process parameter which is set to 0.5 as default.
133-
134-
* ``p_value``
126+
* ``do_clustering``
135127

136-
Probability value that is used for z-score calculation. It is a determination percentage for domain identification with getis-ord statistic. In default, it is set to 0.05 or %5.
128+
Whether to perform the hierarchical clustering or not (Default is True).
137129

138130
* ``result_plot``
139131

140-
Plotting option for debugging. While enabled, DomHMM will print Hidden Markov model iterations result, prediction results, Getis-Ord statistic results and clustering result of three frame.
132+
Plotting option for debugging. While enabled, DomHMM will print Hidden Markov model convergence, prediction results, Getis-Ord statistic results, and clustering results of three frames.
141133

142134
* ``save_plots``
143135

144136
Option for saving result plots in pdf format.
145137

146138
* ``verbose``
147139

148-
Verbose option for debugging. Although, DomHMM doesn't print middle values, it shows which steps are done and shows middle step plots which may give clues about succession of model.
140+
Verbose option for debugging. It shows which steps are done in the analysis.
141+
142+
* ``lipid_leaflet_rate``
149143

144+
The frame rate for checking lipids leaflet assignments via LeafletFinder. In the default option, it is equal to 0 which means leaflet assignment is only done at the beginning of the analysis.
145+
146+
* ``sterol_leaflet_rate``
147+
148+
The frame rate for checking sterols leaflet assignments via LeafletFinder. In the default option, it is equal to 1 which means sterols leaflet assignment will be calculated in every time frame to capture flip-flops.
150149

151150
* ``gmm_kwargs``
152151

153-
Parameter option for Gaussian Mixture Model training. An example of it is
152+
Parameter option for Gaussian Mixture Model training. An example of this is
154153

155154
.. code-block::
156155
157-
gmm_kwargs = {"tol": 1E-4, "init_params": 'k-means++', "verbose": 0,
156+
gmm_kwargs = {"tol": 1E-4, "init_params": 'random_from_data', "verbose": 0,
158157
"max_iter": 10000, "n_init": 20,
159158
"warm_start": False, "covariance_type": "full"}
160159
161160
* ``hmm_kwargs``
162161

163-
Parameter option for Gaussian-based Hidden Markov Model training. An example of it is
162+
Parameter option for Gaussian-based Hidden Markov Model training. An example of this is
164163

165164
.. code-block::
166165
@@ -170,7 +169,7 @@ Parameter option for Gaussian-based Hidden Markov Model training. An example of
170169
171170
* ``trained_hmms``
172171

173-
Parameter option for reusing past DomHMM HMM models. If there are several analysis will be conducted with slightly difference membrane simulations or with different parameter options, first analysis HMM model can be reusable with this parameter.
172+
Parameter option for reusing past DomHMM HMM models. If there are several analyses that will be conducted with slightly different membrane simulations or with different parameter options, the first analysis HMM model can be reusable with this parameter.
174173

175174
.. code-block::
176175
@@ -183,17 +182,30 @@ Parameter option for reusing past DomHMM HMM models. If there are several analys
183182
model_2 = domhmm.PropertyCalculation( ... ,
184183
trained_hmms=reuse_hmm_models)
185184
185+
* ``n_init_hmm``
186+
187+
Number of repeats for HMM model training. HMM models can be trained multiple times to achieve better performance.
188+
189+
* ``frac``
190+
191+
The fraction of box length in x and y outside the unit cell is considered for area per lipid calculation by Voronoi. It is an optimization process parameter that is set to 0.5 as the default.
192+
193+
* ``p_value``
194+
195+
Probability value that is used for z-score calculation. It is a determination percentage for domain identification with the Getis-Ord statistic. In default, it is set to 0.05 or %5.
196+
197+
186198
* ``tmd_protein_list``
187199

188-
Transmembrane domain (tmd) protein list to include area per lipid calculation. Since tmd proteins are take up space in upper, lower or both leaflets, three backbone atoms of protein for each leaflet should be included as in this parameter to increase success of identification.
200+
Transmembrane domain (TMD) protein list to include area per lipid calculation. TMD proteins take up space in the exoplasmic, cytoplasmic leaflets. Three backbone atoms of protein that are in close position to lipid head groups should be included in this parameter to increase the success of identification.
189201

190202
.. code-block::
191203
192-
# Selecting three backbone atoms that is touching to upper leaflet
204+
# Selecting three backbone atoms that are touching the exoplasmic leaflet
193205
upBB = uni.select_atoms('name BB')[0:3]
194-
# Selecting three backbone atoms that is touching to lower leaflet
206+
# Selecting three backbone atoms that are touching the endoplasmic leaflet
195207
loBB = uni.select_atoms('name BB')[-3:]
196-
# List can be expended with multiple dictionary objects as in more than one tmd protein scenarios.
208+
# List can be expended with multiple dictionary objects as in more than one TMD protein scenario.
197209
tmd_protein_list = [{"0": upBB, "1": loBB}]
198210
199-
We encourage to check :doc:`tips` section that may contain useful information for your progress.
211+
We encourage you to check :doc:`tips` section which may contain useful information for your progress.

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Welcome to DomHMM's documentation!
1+
Welcome to DomHMM documentation!
22
=========================================================
33

44
.. toctree::

docs/source/installation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ For installation, you can directly use pip in project directory.
3030
Installation for Development
3131
------------------------------
3232

33-
This type of installation can be use when pip is not usable, change in source code or contributing DomHMM.
33+
This type of installation can be use when source code will be change for special usage or contribution will be done to DomHMM.
3434

3535
Clone DomHMM's repository and change directory to project directory
3636

docs/source/post-analysis.rst

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
Results and Post-Analysis
22
==========================
33

4-
After running of DomHMM, results are achievable via assigned variable which in this document named ``model``. Besides clustering results of ordered and disorder domains, training data that is used for Hidden Markov Model is also available which contains area per lipid calculation and Scc order parameters calculations for each lipid and sterol.
4+
After running DomHMM, results are achievable via an assigned variable which in this document is named ``model``. Besides clustering results of ordered and disorder domains, training data that is used for the Hidden Markov model is also available which contains area per lipid calculation and Scc order parameters calculations for each lipid and sterol.
55

66
Domain Cluster Results
77
-----------------------
8-
``Clustering`` is a Python dictionary which contains each frames residue indexes that are assigned to Lo ordered domains.
8+
``Clustering`` is a Python dictionary that contains each frame residue index that is assigned to lipid-ordered domains.
99

1010
``Clustering`` is a dictionary with two keys ``"0"`` as representing upper leaflet and ``"1"`` as representing lower leaflet.
1111

@@ -26,10 +26,10 @@ Domain Cluster Results
2626
Training Data (Area per lipid and order parameters)
2727
---------------------------------------------------
2828

29-
If required for post analysis, user can access area per lipid and order parameters calculations of each lipid. This data is kept objects result data which can be accessed via ``model.results["train_data_per_type"]``.
29+
If required for post-analysis, the user can access the area per lipid and order parameter calculations of each lipid. This data is kept objects result in data which can be accessed via ``model.results["train_data_per_type"]``.
3030

31-
``train_data_per_type`` is a Python dictionary which contains lipid and sterol names are keys and three dimension arrays as values. In this three dimension array, each dimension contains residue ids, second dimension contains parameters and third dimension contains each frame's residue leaflet assignments.
32-
Be aware that both second and third arrays are in same order of residue ids from first array.
31+
``train_data_per_type`` is a Python dictionary that contains lipid names as keys and three rowed arrays as values. The first row contains residue IDs, the second training data, and the third each frame's residue leaflet assignments.
32+
Be aware that both the second and third arrays are in the same order of residue IDs from the first array.
3333

3434
Here is an example of it.
3535

@@ -45,25 +45,25 @@ Here is an example of it.
4545
4646
.. note::
4747

48-
Each arrays are in ``numpy.array`` format.
48+
Each array is in ``numpy.array`` format.
4949

5050
.. note::
51-
Parameters array (second array) is keep in order of ``[[apl_1, scc_1_1, scc_1_2],[apl_2, scc_2_1, scc_2_2], ...]``. (apl = Area per Lipid, scc__x= Scc Order Parameter of tail x )
51+
Parameters array (second array) is kept in order of ``[[apl_1, scc_1_1, scc_1_2],[apl_2, scc_2_1, scc_2_2], ...]``. (apl = Area per Lipid, scc__x= Scc Order Parameter of tail x )
5252

5353
.. note::
54-
Leaflet assignment array (third array) is consists of 0s and 1s where 0 means upper leaflet and 1 means lower leaflet. Rows are represents residues which are in some order with residue ids from first array and columns are represents frames.
54+
The leaflet assignment array (third array) consists of 0s and 1s where 0 means exoplasmic leaflet and 1 means endoplasmic leaflet. Rows represent residues which are in some order with residue IDs from the first array and columns represent frames.
5555

5656
.. note::
57-
Names of lipids and sterols are same names that user gave in tails and heads parameters.
57+
Names of lipids and sterols are the same names that users gave in tails and heads parameters.
5858

5959

6060
Result Saving
6161
---------------
62-
User can save and reload model's itself or required data via `pickle`_.
62+
Users can save and reload the model itself or required data via `pickle`_.
6363

6464
.. code-block::
6565
66-
# Model's itself or required result sections can be save via pickle
66+
# Model itself or result section can be saved via pickle
6767
with open('DomHMM_model.pickle', 'wb') as file:
6868
pickle.dump(model, file)
6969
@@ -72,5 +72,7 @@ User can save and reload model's itself or required data via `pickle`_.
7272
loaded_module = pickle.load(file)
7373
7474
75+
.. note::
76+
When loading the full model, the MDAnalysis universe will load the trajectory and topology file from the same directory that was given in the analysis run. Therefore, full-model saving can't be loaded if files do not exist.
7577

7678
.. _pickle: https://www.mdanalysis.org/pages/mdakits/

docs/source/tips.rst

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,11 @@ Tips for Usage
33

44
This page contains useful tips that will improve your experience of DomHMM
55

6-
* Computation Time
7-
8-
.. tip::
9-
In our tests, a Martini molecular dynamics simulation with 2000 frames with 720 lipids took around 25 to 30 minutes.
10-
116
.. tip::
12-
Sometimes Hidden Markov model training may stuck which is out of our control. If your program is taking long time with comparing to reference, you may consider restart it and enable `verbose` option.
7+
In our tests, a Martini molecular dynamics simulation with 2000 frames with 720 lipids took around 25 to 30 minutes in Apple M2 chip.
138

149
.. tip::
15-
Simultaneously running more than one DomHMM analysis may cause deadlock due to core allocation logic of hmmlearn library.
16-
17-
* Community Support
10+
Simultaneously running more than one DomHMM analysis may cause deadlock due to core allocation logic of ``hmmlearn`` library.
1811

1912
.. tip::
2013
DomHMM is a fresh open source project. If you face any problems or bugs, you can refer it in issue pages of project's repository. We are looking forward to improve our project and support our users.

0 commit comments

Comments
 (0)