Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
1eed1f8
Adding ensemble metrics to sidebars.
vmullig Feb 8, 2022
9156829
Updating main RosettaScripts page.
vmullig Feb 8, 2022
35cb717
Adding page for CentralTendency metric.
vmullig Feb 8, 2022
cc023e6
Updating auto-generated docs.
vmullig Feb 8, 2022
a18aca6
Adding auto-generated ensemble metric docs.
vmullig Feb 8, 2022
3e8cc7f
Updating CentralTendency ensemble metric doc.
vmullig Feb 8, 2022
0e5918f
Working on documentation for EnsembleMetrics.
vmullig Feb 10, 2022
b3730e1
Fleshing out EnsembleMetric documentation.
vmullig Feb 10, 2022
97d7892
Updating auto-generated docs.
vmullig Feb 10, 2022
ca9eebd
Adding note about accessing named values.
vmullig Feb 10, 2022
bbfdb09
Adding note about filtering.
vmullig Feb 10, 2022
9dd6e89
Revising text slightly.
vmullig Feb 10, 2022
9629b4b
Adding note about MPI mode.
vmullig Feb 10, 2022
3301900
Adding example of internal generation mode.
vmullig Feb 10, 2022
a5778bb
Adding note about multithreading.
vmullig Feb 10, 2022
a045962
Updating note about multi-threading.
vmullig Feb 11, 2022
eb93e85
Adding example for mode 3.
vmullig Feb 11, 2022
62a9158
Add EnsembleFilter docs to filter list.
vmullig Feb 11, 2022
d38608f
Moving some filters that were in the wrong folder.
vmullig Feb 11, 2022
11215f2
Adding documentation for EnsembleFilter.
vmullig Feb 11, 2022
b7e9ef7
Minor typos.
vmullig Feb 11, 2022
fc6e674
Expanding note about mode.
vmullig Feb 11, 2022
08dc81e
Minor tweak.
vmullig Feb 11, 2022
0b2cc38
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Feb 25, 2022
af0cded
Updating note about MPI.
vmullig Feb 25, 2022
41dacf1
Updating CentralTendency and FragmentScore auto-generated docs.
vmullig Feb 25, 2022
faa5e70
Merge branch 'vmullig/ensemble_metric_doc' into vmullig/ensemble_metr…
vmullig Feb 25, 2022
4d5dc12
Adding auto-generated docs for the PCA ensemble metric.
vmullig Feb 25, 2022
a282ac4
Updating EnsembleMetrics page.
vmullig Feb 25, 2022
7e6a155
Updating CentralTendency page.
vmullig Feb 25, 2022
5e5eabc
Initial commit of PrincipalComponentAnalysis page.
vmullig Feb 25, 2022
22503cc
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Mar 11, 2022
99146f7
Updating auto-generated docs.
vmullig Mar 11, 2022
e23b820
Merge branch 'vmullig/ensemble_metric_doc' into vmullig/ensemble_metr…
vmullig Mar 11, 2022
af0a491
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Apr 18, 2022
4fa4729
Update documentation with mention of support in MPIFileBufJobDistribu…
vmullig Apr 18, 2022
2549d1b
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Apr 27, 2022
bfe5514
Merge branch 'vmullig/ensemble_metric_doc' into vmullig/ensemble_metr…
vmullig Apr 27, 2022
f2cbac0
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Jul 2, 2022
9c8700e
Merge branch 'vmullig/ensemble_metric_doc' into vmullig/ensemble_metr…
vmullig Jul 2, 2022
802ce09
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Oct 11, 2022
1aed331
Merge branch 'vmullig/ensemble_metric_doc' into vmullig/ensemble_metr…
vmullig Oct 11, 2022
b922fd9
Merge branch 'vmullig/ensemble_metric_mpi_docs' into vmullig/pca_metr…
vmullig Oct 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# CentralTendency Ensemble Metric
*Back to [[SimpleMetrics]] page.*
## CentralTendency Ensemble Metric

[[_TOC_]]

### Description

The Central Tendency metric accepts as input a real-valued [[SimpleMetric|SimpleMetrics]]. It then applies it to each pose in an ensemble, collecting a series of values. At reporting time, the metric computes measures of central tendency (mean, median, and mode), plus other descriptive statistics about the distribution of the measured value over the ensemble (standard deviation, standard error, min, max, range).

### Author and history

Created Tuesday, 8 February 2022 by Vikram K. Mulligan, Center for Computational Biology, Flatiron Institute ([email protected]). This was the first [[EnsembleMetric|EnsembleMetrics]] implemented

### Interface

[[include:ensemble_metric_CentralTendency_type]]

### Named values produced

Measure | Name (used for the [[EnsembleFilter]]) | Description
--------|----------------------------------------|------------
Mean | mean | The average of the values measured for the poses in the ensemble.
Median | median | When values measured from all of hte poses in the ensemble are listed in increasing order, this is the middle value. If the number of poses in the ensemble is even, the middle two values are averaged.
Mode | mode | The most frequently seen value in the values measured from the poses in the environment. If more than one value appears with equal frequency and this frequency is highest, the values are averaged.
Standard Deviation | stddev | Estimate of the standard deviation of the mean, defined as the sqrt( sum_i( S_i - mean )^2 / N ), where S_i is the ith sample, mean is the average of all the samples, and N is the number of samples.
Standard Error | stderr | Estimate of the standard error of the mean, defined by stddev / sqrt(N), where N is the number of samples.
Min | min | The minimum value seen.
Max | max | The maximum value seen.
Range | range | the largest value seen minus the smallest.

#### Note about mode

The mode of a set of floating-point numbers can be thrown off by floating-point error. For instance, two poses may have energies of -3.7641 kJ/mol, but the process of computing that energy may result in slightly different values at the 15th decimal point. This could prevent the filter from recognizing this is at the most frequent value. Mode is most useful as a metric when the "floating-point" values are actually integers (for instance, given a [[SimpleMetric|SimpleMetrics]] like the [[SelectedResidueCountMetric]], which returns integer counts).

## See Also

* [[SimpleMetrics]]: Available SimpleMetrics.
* [[PrincipalComponentAnalysis EnsembleMetric|PrincipalComponentAnalysis]]: An EnsembleMetric that determines major degrees of freedom of motion from an ensemble of perturbed poses.
* [[EnsembleMetrics]]: Available EnsembleMetrics.
* [[I want to do x]]: Guide to choosing a tool in Rosetta.
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# PrincipalComponentAnalysis Ensemble Metric
*Back to [[SimpleMetrics]] page.*
## PrincipalComponentAnalysis Ensemble Metric

[[_TOC_]]

### Description

The PrincipalComponentAnalysis EnsembleMetric is intended to be applied to an ensemble of structures that represent small perturbations of an input structure. For each structure, it generates a vector of user-defined degrees of freedom. These are typically backbone (and, optionally, side-chain) torsions, but they can include bond angles, bond lengths, rigid-body translations and rotations, and even Cartesian coordinates. Once a degree-of-freedom vector has been stored for each pose in an ensemble, this EnsembleMetric performs principal component analysis. This identifies the linear combinations of degrees of freedom that allow the greatest motion. When the analysis is weighted by Rosetta energy, the combined degrees of freedom that allow the greatest motion _while remaining in low-energy regions of conformation space_ are identified.

This EnsembleMetric can be used in conjunction with the [[visualize_principal_components_of_motion]] application in order to make movies showing these motion vectors.

### Author and history

Created Thursday, 25 February 2022 by Vikram K. Mulligan, Center for Computational Biology, Flatiron Institute ([email protected]). This was the second [[EnsembleMetric|EnsembleMetrics]] implemented

### Interface

[[include:ensemble_metric_PrincipalComponentAnalysis_type]]

### Note about weighting by energy

By default, the analysis is weighted by energy. This has two effects:

- When computing the centre of the distribution of conformations, each sample contributes in proportion to its Boltzmann probability, computed as `exp(-E_i/(kbT))/Z`, where `E_i` is the energy of the sample, `kbt` is the user-specified Boltzmann temperature (with 0.62 kcal/mol representing physiological temperature), and `Z` is the partition function (equal to the sum of `exp(-E_i/(kbT))` for all samples `i`).
- Once the centre is subtracted off of each degree-of-freedom vector, each one is scaled by its Boltzmann probability. This reduces the influence of the highest-energy samples and increases the influence of the lowest-energy.

The net effect of all of this is that the analysis ends up extracting degrees of freedom of motion that represent allowed motions that keep the protein in (mostly) low-energy conformations while still permitting as much motion as possible.

### Use with residue selectors

TODO TODO TODO

### Example

TODO TODO TODO

### Outputs

#### Report and report file

The primary output is a machine- and human-readable degree-of-freedom file. This contains:

- A binary silent structure that allows reconstruction of a pose representing the centre (or energy-weighted centre) of the ensemble of degree-of-freedom vectors.
- A vector of degree-of-freedom identities, annotating the subsequent degree-of-freedom vectors.
- A set of degree-of-freedom eigenvectors from principal component analysis. These are arranged in order, with the first accounting for the greatest part of the variance in structure, the second for the second-greatest, etc. All are normalized.
- A set of degree-of-freedom eigenvalues from principal component analysis. These indicate the relative contributions of each motion vector to the variance.

This file may be read by Rosetta's [[visualize_principal_components_of_motion]] application in order to generate pose series animating the major degrees of freedom of motion. Future support is also planned for connecting this to the [[parametric design code|MakeBundleMover]] so that one may sample along the first few motion vectors during protein design or docking, allowing limited backbone flexibility while still restricting the dimensionality of the degree-of-freedom space.

#### Named values produced

TODO TODO TODO

### Limitations

Principal component analysis is fundamentally a linear algebra technique. This means that it is restricted to identifying motion vectors that are _linear combinations_ of the degrees of freedom being considered. While it is a good approximation to say that motions are linear on a small scale, larger-scale motions cannot be captured in this way. Various nonlinear principal manifold analysis methods exist which could be applied more generally, but that is the subject of future research.

## See Also

* [[SimpleMetrics]]: Available SimpleMetrics.
* [[CentralTendency EnsembleMetric|CentralTendency]]: An EnsembleMetric that computes mean, median, and mode of values produced by a [[SimpleMetric|SimpleMetrics]].
* [[EnsembleMetrics]]: Available EnsembleMetrics.
* [[I want to do x]]: Guide to choosing a tool in Rosetta.
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[FeaturesReporters|Features-reporter-overview]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@

* [[Filters|Filters-RosettaScripts]]

* [[Simple Metrics|SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Residue Selectors|ResidueSelectors]]

* [[PackerPalettes|PackerPalette]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@

* [[Residue Selectors|ResidueSelectors]]

* [[Simple Metrics|SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[PackerPalettes|PackerPalette]]

* [[Filters|Filters-RosettaScripts]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Filter | Description
**[[CompoundStatement|CompoundStatementFilter]]** | Uses previously defined filters with logical operations to construct a compound filter.
**[[CombinedValue|CombinedValueFilter]]** | Weighted sum of multiple filters.
**[[CalculatorFilter]]** | Combine multiple filters with a mathematical expression.
**[[EnsembleFilter]]** | Filter based, not on a property of a single pose, but on a property of an _ensemble_ of many poses.
**[[ReplicateFilter]]** | Repeat a filter multiple times and average.
**[[Boltzmann|BoltzmannFilter]]** | Boltzmann weighted sum of positive/negative filters.
**[[MoveBeforeFilter]]** | Apply a mover before applying the filter.
Expand Down
2 changes: 2 additions & 0 deletions scripting_documentation/RosettaScripts/Filters/_Sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics | EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[FeaturesReporters|Features-reporter-overview]]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# EnsembleFilter
*Back to [[SimpleMetrics]] page.*
*Back to [[Filters | Filters-RosettaScripts]] page.*
## EnsembleFilter

Created by Vikram K. Mulligan ([email protected]) on 10 February 2022.

[[_TOC_]]

### Description

This filter takes as input an [[EnsembleMetric|EnsembleMetrics]] that has been used to evaluate some set of properties of an ensemble of filters, retrives a named floating-point value from the metric, and filters based on whether that value is greater than, equal to, or less than some threshold. (Note that [[EnsembleMetrics]] evaluate a property of a collection or _ensemble_ poses, not of a single pose. This makes this filter unusual: where most discard a trajectory based on the state of a single pose, this can discard a trajectory based on the state of large ensemble of poses -- for example, based on many sampled conformatinos of a single design.)


### Options

[[include:filter_SimpleMetricFilter_type]]

### Example:

In this example, we load one or more cyclic peptides (provided with the `-in:file:s` or `-in:file:l` commandline options), generate a conformational ensemble of slightly perturbed conformations for each peptide _in memory_, without writing all structures to disk, and perform ensemble analysis on that ensemble with the [[CentralTendency EnsembleMetric|CentralTendency]], filtering on the results with the EnsembleFilter. Only those peptides that have low-energy ensembles of perturbed conformations pass the filter.

```xml
<ROSETTASCRIPTS>
<!-- Example of using the EnsembleFilter to filter based on the properties of an ensemble of poses
generated from the current pose. -->
<SCOREFXNS>
<ScoreFunction name="r15" weights="ref2015.wts" />
</SCOREFXNS>
<MOVERS>
<!-- The movers that set up, perturb, and relax a cyclic peptide are set up here. We
later bundle the perturbation protocol in a ParsedProtocol: -->
<DeclareBond name="connect_termini" res1="8" res2="1" atom1="C" atom2="N" add_termini="true" />
<GeneralizedKIC name="perturb1" selector_scorefunction="r15" closure_attempts="200"
stop_when_n_solutions_found="1" selector="lowest_rmsd_selector"
>
<AddResidue res_index="3"/>
<AddResidue res_index="4"/>
<AddResidue res_index="5"/>
<AddResidue res_index="6"/>
<AddResidue res_index="7"/>
<SetPivots res1="3" atom1="CA" res2="5" atom2="CA" res3="7" atom3="CA" />
<AddPerturber effect="perturb_dihedral" >
<AddAtoms res1="3" atom1="N" res2="3" atom2="CA" />
<AddAtoms res1="3" atom1="CA" res2="3" atom2="C" />
<AddAtoms res1="4" atom1="N" res2="4" atom2="CA" />
<AddAtoms res1="4" atom1="CA" res2="4" atom2="C" />
<AddAtoms res1="5" atom1="N" res2="5" atom2="CA" />
<AddAtoms res1="5" atom1="CA" res2="5" atom2="C" />
<AddAtoms res1="6" atom1="N" res2="6" atom2="CA" />
<AddAtoms res1="6" atom1="CA" res2="6" atom2="C" />
<AddAtoms res1="7" atom1="N" res2="7" atom2="CA" />
<AddAtoms res1="7" atom1="CA" res2="7" atom2="C" />
<AddValue value="5.0"/>
</AddPerturber>
</GeneralizedKIC>
<GeneralizedKIC name="perturb2" selector_scorefunction="r15" closure_attempts="200"
stop_when_n_solutions_found="1" selector="lowest_rmsd_selector"
>
<AddResidue res_index="7"/>
<AddResidue res_index="1"/>
<AddResidue res_index="2"/>
<AddResidue res_index="3"/>
<AddResidue res_index="4"/>
<SetPivots res1="7" atom1="CA" res2="2" atom2="CA" res3="4" atom3="CA"></SetPivots>
<AddPerturber effect="perturb_dihedral" >
<AddAtoms res1="7" atom1="N" res2="7" atom2="CA" />
<AddAtoms res1="7" atom1="CA" res2="7" atom2="C" />
<AddAtoms res1="1" atom1="N" res2="1" atom2="CA" />
<AddAtoms res1="1" atom1="CA" res2="1" atom2="C" />
<AddAtoms res1="2" atom1="N" res2="2" atom2="CA" />
<AddAtoms res1="2" atom1="CA" res2="2" atom2="C" />
<AddAtoms res1="3" atom1="N" res2="3" atom2="CA" />
<AddAtoms res1="3" atom1="CA" res2="3" atom2="C" />
<AddAtoms res1="4" atom1="N" res2="4" atom2="CA" />
<AdmoverdAtoms res1="4" atom1="CA" res2="4" atom2="C" />
<AddValue value="5.0"/>
</AddPerturber>
</GeneralizedKIC>
<FastRelax name="frlx" repeats="1" scorefxn="r15" />
<!-- Bundling the perturbation steps together so that they can be passed
to the CentralTendency EnsembleMetric: -->
<ParsedProtocol name="ensemble_generating_protocol" >
<Add mover="perturb1" />
<Add mover="perturb2" />
<Add mover="frlx" />
</ParsedProtocol>
</MOVERS>
<SIMPLE_METRICS>
<!-- The SimpleMetric that will be passed to the CentralTendency EnsembleMetric: -->
<TotalEnergyMetric name="total_energy" scorefxn="r15" />
</SIMPLE_METRICS>
<ENSEMBLE_METRICS>
<!-- Setting up the EnsembleMetric with both a SimpleMetric and a
ParsedProtocol for generating the ensemble from a given pose: -->
<CentralTendency name="avg_energy" n_threads="0" real_valued_metric="total_energy"
output_mode="tracer_and_file" output_filename="report.txt"
ensemble_generating_protocol="ensemble_generating_protocol"
ensemble_generating_protocol_repeats="20"
/>
</ENSEMBLE_METRICS>
<FILTERS>
<!-- Set up the filter that can discard those peptides that yield an
ensemble with energy above a cutoff threshold: -->
<EnsembleFilter name="filter_on_avg_energy" ensemble_metric="avg_energy"
named_value="mean" filter_acceptance_mode="less_than_or_equal"
threshold="4.0"
/>
</FILTERS>
<PROTOCOLS>
<!-- Set up the peptide, but don't perturb it yet: -->
<Add mover="connect_termini" />
<!-- Accumulate data with the EnsembleMetric for every replicate of the
peturbation protocol (which in this case is run by the EnsembleMetric,
generating each member of the ensemble internally, in memory, without
exporting them): -->
<Add ensemble_metrics="avg_energy" />
<!-- Abandon the jobs that produce bad ensemble properties prior to
writing the structure back to disk: -->
<Add filter="filter_on_avg_energy" />
</PROTOCOLS>
<OUTPUT scorefxn="r15" />
</ROSETTASCRIPTS>
```

### See also

* [[EnsembleMetrics]]: Available SimpleMetrics
* [[SimpleMetrics]]: Available SimpleMetrics
* [[SimpleMetricFilter]]: Filter on an arbitrary SimpleMetric
* [[Movers|Movers-RosettaScripts]]: Available Movers
* [[I want to do x]]: Guide to choosing a Rosetta protocol.
2 changes: 2 additions & 0 deletions scripting_documentation/RosettaScripts/Movers/_Sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[FeaturesReporters|Features-reporter-overview]]
Expand Down
1 change: 1 addition & 0 deletions scripting_documentation/RosettaScripts/RosettaScripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Fleishman SJ, Leaver-Fay A, Corn JE, Strauch EM, Khare SD, et al. (2011) Rosetta
- [[JumpSelectors |JumpSelectors]]
- [[PackerPalettes|PackerPalette]]
- [[SimpleMetrics]]
- [[EnsembleMetrics]]

---------------------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@

* [[Residue Selectors|ResidueSelectors]]

* [[Simple Metrics|SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[PackerPalettes|PackerPalette]]

* [[Task Operations|TaskOperations-RosettaScripts]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
* [[Task Operations|TaskOperations-RosettaScripts]]

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

Expand Down
2 changes: 2 additions & 0 deletions scripting_documentation/RosettaScripts/_Sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[FeaturesReporters|Features-reporter-overview]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[Features Reporters|Features-reporter-overview]]
Expand Down
Loading