You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This section is about how to use DomHMM and how to elaborate on results.
4
+
This section explains how to configure and run DomHMM with several different options.
5
5
6
6
.. note::
7
-
In project's ``/example`` directory, you can find reallife usage of DomHMM.
7
+
In the project's ``/example`` directory, you can find real-life examples of DomHMM.
8
8
9
9
Running DomHMM
10
10
--------------
11
11
12
-
DomHMM's main class is ``PropertyCalculation``. In a basic example it is initialized as
12
+
DomHMM's main class is ``PropertyCalculation``. In a basic example, it is initialized as
13
13
14
14
.. code-block::
15
15
@@ -34,29 +34,29 @@ Main Parameters
34
34
35
35
Let's dive into each parameter's details.
36
36
37
-
* In initialization process, ``universe_or_atomgroup`` parameter stands for MDAnalysis universe. It contains your simulation's trajectory and tpr file. It can be created as
37
+
* In initialization process, ``universe_or_atomgroup`` parameter stands for MDAnalysis universe. It contains your simulation's trajectory and topology file. It can be created as
38
38
39
39
.. code-block::
40
40
41
41
path2xtc = "YOUR_XTC_FILE.xtc"
42
42
path2tpr = "YOUR_TPR_FILE.tpr"
43
43
universe = mda.Universe(path2tpr, path2xtc)
44
44
45
-
* ``leaflet_kwargs`` parameter stands for MDAnalysis ``LeafletFinder`` function's arguments. It is used to determine each leaflets residues. ``leaflet_kwargs`` requires head groups of lipids but not sterols.
45
+
* ``leaflet_kwargs`` parameter stands for MDAnalysis ``LeafletFinder`` function's arguments. It is used to determine each leaflet's lipids. ``leaflet_kwargs`` requires head groups of lipids but not sterols.
* ``membrane_select`` argument is for atom group selection of universe. It is useful for simulations that are contain non-membrane residues/molecules inside. If universe contains only membrane elementsparameter can be leave in default option which is ``all``
52
+
* ``membrane_select`` argument is for atom group selection of universe. It is useful for simulations that contain non-membrane residues/molecules inside. If the universe contains only membrane elements, the parameter can be left in the default option which is ``all``
53
53
54
54
.. code-block::
55
55
56
56
# An example where simulation contains DPPC and DIPC lipids, and CHOL sterol
57
57
membrane_select = "resname DPPC DIPC CHOL"
58
58
59
-
* ``leaflet_select`` argument is selection options for lipids which can be list of atom groups, list of string queries or automatically finding via LeafletFinder.
59
+
* ``leaflet_select`` argument is a selection option for lipids which can be a list of atom groups, a list of string queries, or automatic via LeafletFinder.
60
60
61
61
.. code-block::
62
62
@@ -69,31 +69,31 @@ Let's dive into each parameter's details.
69
69
# Leave leaflet detection to DomHMM via LeafletFinder
70
70
leaflet_select = "auto"
71
71
72
-
* ``heads`` parameter requires lipids head groups. For atomistic simulations, head molecules' center atom can be entered.
72
+
* ``heads`` parameter requires lipids head groups. For atomistic simulations, the head molecules' center atom can be entered.
73
73
74
74
.. code-block::
75
75
76
76
heads = {"DPPC": "PO4", "DIPC": "PO4"}
77
77
78
-
* ``sterol_heads`` parameter requires sterol head groups. For atomistic simulations, head molecules' center atom can be entered.
78
+
* ``sterol_heads`` parameter requires sterol head groups. For atomistic simulations, the head molecules' center atom can be entered.
79
79
80
80
.. code-block::
81
81
82
-
# Martini Cholestrol example
82
+
# Martini Cholesterol example
83
83
sterol_heads = {"CHOL": "ROH"}
84
-
# Atomistic Cholestrol example
84
+
# Atomistic Cholesterol example
85
85
sterol_heads = {"CHL1": "O3"}
86
86
87
-
* ``sterol_tails`` parameter requires sterol tail groups. It should be considered that each tail should be entered in same order for each lipids.
87
+
* ``sterol_tails`` parameter requires sterol tail groups. It should be considered that each tail should be entered in the same order for each lipid.
88
88
89
89
.. code-block::
90
90
91
-
# Martini Cholestrol example while ROH head as first element and C1 start of tail as second element
91
+
# Martini Cholesterol example while ROH head as the first element and C1 start of the tail the second element
92
92
sterol_tails = {"CHOL": ["ROH", "C1"]}
93
-
# Atomistic Cholestrol example while O3 head as first element and C20 start of tail as second element
93
+
# Atomistic Cholesterol example while O3 head as first element and C20 start of tail as second element
94
94
sterol_tails = {"CHL1": ["O3", "C20"]}
95
95
96
-
* ``tails`` parameter requires lipids tail groups. It should be considered that each tail should be entered in same order for each lipids.
96
+
* ``tails`` parameter requires lipids tail groups. It should be considered that each tail should be entered in the same order for each lipid.
97
97
98
98
.. code-block::
99
99
@@ -102,65 +102,64 @@ Let's dive into each parameter's details.
* For run option, you can have ``start``, ``stop`` and ``step`` options. This options arrange which frame to start, stop. You can also set model to be trained for each *X* frame by setting ``step=X``.
105
+
* For run option, you can have ``start``, ``stop`` and ``step`` options. These options arrange which frame to start or stop. You can also set the model to be trained for each *X* frame by setting ``step=X``.
106
106
107
107
.. code-block::
108
108
109
-
# An example where DomHMM model training starts from 5th frame and ends in 1000th frame while taking each 5th step. First three frames will be 5th, 10th and 15th frames.
109
+
# An example where DomHMM model training starts from the 5th frame and ends in the 1000th frame while taking each 5th step. The first three frames will be the 5th, 10th, and 15th frames.
110
110
model.run(start=5, stop=1000, step=5)
111
111
112
112
.. warning::
113
-
If detailed postanalysis will be conducted on result such as usage of ``Getis_Ord`` results, input order of lipids and sterols should be in same order as in simulation. If simulation lipids are in order of ``DPPC, DIPC, CHOL`` with respect to residue ids, keys of ``heads``, ``tails``, ``sterol_heads``, and ``sterol_tails`` should be in same order just like in this example.
113
+
If detailed post-analysis will be conducted on results such as usage of ``Getis_Ord`` results, the input order of lipids and sterols should be in same order as in simulation. If simulation lipids are in order of ``DPPC, DIPC, CHOL`` with respect to residue IDs, keys of ``heads``, ``tails``, ``sterol_heads``, and ``sterol_tails`` should be in the same order just like in this example.
114
114
115
115
.. note::
116
116
117
-
Since DomHMM uses Gaussian Mixture Model and Gaussian-based Hidden Markov Model, it is suggested to not use too short or too long simulations. Short simulations may not create a sensible results and long one would be take too much time to train model. In our examples, we used simulations that contains around 2000 frames and model run is finished around 25-30 minutes.
117
+
Since DomHMM uses the Gaussian Mixture Model and Gaussian-based Hidden Markov Model, it is suggested to not use too short or too long simulations. Short simulations may not create sensible results and long ones would take too much time to train the model. In our examples, we used simulations that contains around 2000 frames, and the model run is finished around 25-30 minutes.
118
118
119
119
Optional Parameters
120
120
-------------------
121
121
122
-
* ``do_clustering``
123
-
124
-
Whether to perform the hierarchical clustering or not (Default is True).
125
-
126
122
* ``asymmetric_membrane``
127
123
128
-
It needs to be enabled if leaflets are not symmetric. With this option, models are fitted by separated data for each leaflets.
129
-
130
-
* ``frac``
124
+
It needs to be enabled if leaflets are not symmetric. With this option, models are fitted by separated data for each leaflet.
131
125
132
-
Fraction of box length in x and y outside the unit cell considered for area per lipid calculation by Voronoi. It is an optimization process parameter which is set to 0.5 as default.
133
-
134
-
* ``p_value``
126
+
* ``do_clustering``
135
127
136
-
Probability value that is used for z-score calculation. It is a determination percentage for domain identification with getis-ord statistic. In default, it is set to 0.05 or %5.
128
+
Whether to perform the hierarchical clustering or not (Default is True).
137
129
138
130
* ``result_plot``
139
131
140
-
Plotting option for debugging. While enabled, DomHMM will print Hidden Markov model iterations result, prediction results, Getis-Ord statistic results and clustering result of three frame.
132
+
Plotting option for debugging. While enabled, DomHMM will print Hidden Markov model convergence, prediction results, Getis-Ord statistic results, and clustering results of three frames.
141
133
142
134
* ``save_plots``
143
135
144
136
Option for saving result plots in pdf format.
145
137
146
138
* ``verbose``
147
139
148
-
Verbose option for debugging. Although, DomHMM doesn't print middle values, it shows which steps are done and shows middle step plots which may give clues about succession of model.
140
+
Verbose option for debugging. It shows which steps are done in the analysis.
141
+
142
+
* ``lipid_leaflet_rate``
149
143
144
+
The frame rate for checking lipids leaflet assignments via LeafletFinder. In the default option, it is equal to 0 which means leaflet assignment is only done at the beginning of the analysis.
145
+
146
+
* ``sterol_leaflet_rate``
147
+
148
+
The frame rate for checking sterols leaflet assignments via LeafletFinder. In the default option, it is equal to 1 which means sterols leaflet assignment will be calculated in every time frame to capture flip-flops.
150
149
151
150
* ``gmm_kwargs``
152
151
153
-
Parameter option for Gaussian Mixture Model training. An example of it is
152
+
Parameter option for Gaussian Mixture Model training. An example of this is
Parameter option for Gaussian-based Hidden Markov Model training. An example of it is
162
+
Parameter option for Gaussian-based Hidden Markov Model training. An example of this is
164
163
165
164
.. code-block::
166
165
@@ -170,7 +169,7 @@ Parameter option for Gaussian-based Hidden Markov Model training. An example of
170
169
171
170
* ``trained_hmms``
172
171
173
-
Parameter option for reusing past DomHMM HMM models. If there are several analysis will be conducted with slightly difference membrane simulations or with different parameter options, first analysis HMM model can be reusable with this parameter.
172
+
Parameter option for reusing past DomHMM HMM models. If there are several analyses that will be conducted with slightly different membrane simulations or with different parameter options, the first analysis HMM model can be reusable with this parameter.
174
173
175
174
.. code-block::
176
175
@@ -183,17 +182,30 @@ Parameter option for reusing past DomHMM HMM models. If there are several analys
183
182
model_2 = domhmm.PropertyCalculation( ... ,
184
183
trained_hmms=reuse_hmm_models)
185
184
185
+
* ``n_init_hmm``
186
+
187
+
Number of repeats for HMM model training. HMM models can be trained multiple times to achieve better performance.
188
+
189
+
* ``frac``
190
+
191
+
The fraction of box length in x and y outside the unit cell is considered for area per lipid calculation by Voronoi. It is an optimization process parameter that is set to 0.5 as the default.
192
+
193
+
* ``p_value``
194
+
195
+
Probability value that is used for z-score calculation. It is a determination percentage for domain identification with the Getis-Ord statistic. In default, it is set to 0.05 or %5.
196
+
197
+
186
198
* ``tmd_protein_list``
187
199
188
-
Transmembrane domain (tmd) protein list to include area per lipid calculation. Since tmd proteins are take up space in upper, lower or both leaflets, three backbone atoms of protein for each leaflet should be included as in this parameter to increase success of identification.
200
+
Transmembrane domain (TMD) protein list to include area per lipid calculation. TMD proteins take up space in the exoplasmic, cytoplasmic leaflets. Three backbone atoms of protein that are in close position to lipid head groups should be included in this parameter to increase the success of identification.
189
201
190
202
.. code-block::
191
203
192
-
# Selecting three backbone atoms that is touching to upper leaflet
204
+
# Selecting three backbone atoms that are touching the exoplasmic leaflet
193
205
upBB = uni.select_atoms('name BB')[0:3]
194
-
# Selecting three backbone atoms that is touching to lower leaflet
206
+
# Selecting three backbone atoms that are touching the endoplasmic leaflet
195
207
loBB = uni.select_atoms('name BB')[-3:]
196
-
# List can be expended with multiple dictionary objects as in more than one tmd protein scenarios.
208
+
# List can be expended with multiple dictionary objects as in more than one TMD protein scenario.
197
209
tmd_protein_list = [{"0": upBB, "1": loBB}]
198
210
199
-
We encourage to check :doc:`tips` section that may contain useful information for your progress.
211
+
We encourage you to check :doc:`tips` section which may contain useful information for your progress.
Copy file name to clipboardExpand all lines: docs/source/post-analysis.rst
+13-11Lines changed: 13 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,11 @@
1
1
Results and Post-Analysis
2
2
==========================
3
3
4
-
After running of DomHMM, results are achievable via assigned variable which in this document named ``model``. Besides clustering results of ordered and disorder domains, training data that is used for Hidden Markov Model is also available which contains area per lipid calculation and Scc order parameters calculations for each lipid and sterol.
4
+
After running DomHMM, results are achievable via an assigned variable which in this document is named ``model``. Besides clustering results of ordered and disorder domains, training data that is used for the Hidden Markov model is also available which contains area per lipid calculation and Scc order parameters calculations for each lipid and sterol.
5
5
6
6
Domain Cluster Results
7
7
-----------------------
8
-
``Clustering`` is a Python dictionary which contains each frames residue indexes that are assigned to Lo ordered domains.
8
+
``Clustering`` is a Python dictionary that contains each frame residue index that is assigned to lipid-ordered domains.
9
9
10
10
``Clustering`` is a dictionary with two keys ``"0"`` as representing upper leaflet and ``"1"`` as representing lower leaflet.
11
11
@@ -26,10 +26,10 @@ Domain Cluster Results
26
26
Training Data (Area per lipid and order parameters)
If required for postanalysis, user can access area per lipid and order parameters calculations of each lipid. This data is kept objects result data which can be accessed via ``model.results["train_data_per_type"]``.
29
+
If required for post-analysis, the user can access the area per lipid and order parameter calculations of each lipid. This data is kept objects result in data which can be accessed via ``model.results["train_data_per_type"]``.
30
30
31
-
``train_data_per_type`` is a Python dictionary which contains lipid and sterol names are keys and three dimension arrays as values. In this three dimension array, each dimension contains residue ids, second dimension contains parameters and third dimension contains each frame's residue leaflet assignments.
32
-
Be aware that both second and third arrays are in same order of residue ids from first array.
31
+
``train_data_per_type`` is a Python dictionary that contains lipid names as keys and three rowed arrays as values. The first row contains residue IDs, the second training data, and the third each frame's residue leaflet assignments.
32
+
Be aware that both the second and third arrays are in the same order of residue IDs from the first array.
33
33
34
34
Here is an example of it.
35
35
@@ -45,25 +45,25 @@ Here is an example of it.
45
45
46
46
.. note::
47
47
48
-
Each arrays are in ``numpy.array`` format.
48
+
Each array is in ``numpy.array`` format.
49
49
50
50
.. note::
51
-
Parameters array (second array) is keep in order of ``[[apl_1, scc_1_1, scc_1_2],[apl_2, scc_2_1, scc_2_2], ...]``. (apl = Area per Lipid, scc__x= Scc Order Parameter of tail x )
51
+
Parameters array (second array) is kept in order of ``[[apl_1, scc_1_1, scc_1_2],[apl_2, scc_2_1, scc_2_2], ...]``. (apl = Area per Lipid, scc__x= Scc Order Parameter of tail x )
52
52
53
53
.. note::
54
-
Leaflet assignment array (third array) is consists of 0s and 1s where 0 means upper leaflet and 1 means lower leaflet. Rows are represents residues which are in some order with residue ids from first array and columns are represents frames.
54
+
The leaflet assignment array (third array) consists of 0s and 1s where 0 means exoplasmic leaflet and 1 means endoplasmic leaflet. Rows represent residues which are in some order with residue IDs from the first array and columns represent frames.
55
55
56
56
.. note::
57
-
Names of lipids and sterols are same names that user gave in tails and heads parameters.
57
+
Names of lipids and sterols are the same names that users gave in tails and heads parameters.
58
58
59
59
60
60
Result Saving
61
61
---------------
62
-
User can save and reload model's itself or required data via `pickle`_.
62
+
Users can save and reload the model itself or required data via `pickle`_.
63
63
64
64
.. code-block::
65
65
66
-
# Model's itself or required result sections can be save via pickle
66
+
# Model itself or result section can be saved via pickle
67
67
with open('DomHMM_model.pickle', 'wb') as file:
68
68
pickle.dump(model, file)
69
69
@@ -72,5 +72,7 @@ User can save and reload model's itself or required data via `pickle`_.
72
72
loaded_module = pickle.load(file)
73
73
74
74
75
+
.. note::
76
+
When loading the full model, the MDAnalysis universe will load the trajectory and topology file from the same directory that was given in the analysis run. Therefore, full-model saving can't be loaded if files do not exist.
Copy file name to clipboardExpand all lines: docs/source/tips.rst
+2-9Lines changed: 2 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,18 +3,11 @@ Tips for Usage
3
3
4
4
This page contains useful tips that will improve your experience of DomHMM
5
5
6
-
* Computation Time
7
-
8
-
.. tip::
9
-
In our tests, a Martini molecular dynamics simulation with 2000 frames with 720 lipids took around 25 to 30 minutes.
10
-
11
6
.. tip::
12
-
Sometimes Hidden Markov model training may stuck which is out of our control. If your program is taking long time with comparing to reference, you may consider restart it and enable `verbose` option.
7
+
In our tests, a Martini molecular dynamics simulation with 2000 frames with 720 lipids took around 25 to 30 minutes in Apple M2 chip.
13
8
14
9
.. tip::
15
-
Simultaneously running more than one DomHMM analysis may cause deadlock due to core allocation logic of hmmlearn library.
16
-
17
-
* Community Support
10
+
Simultaneously running more than one DomHMM analysis may cause deadlock due to core allocation logic of ``hmmlearn`` library.
18
11
19
12
.. tip::
20
13
DomHMM is a fresh open source project. If you face any problems or bugs, you can refer it in issue pages of project's repository. We are looking forward to improve our project and support our users.
0 commit comments