diff --git a/scripting_documentation/RosettaScripts/EnsembleMetrics/EnsembleMetrics.md b/scripting_documentation/RosettaScripts/EnsembleMetrics/EnsembleMetrics.md new file mode 100644 index 000000000..5bc56ec9b --- /dev/null +++ b/scripting_documentation/RosettaScripts/EnsembleMetrics/EnsembleMetrics.md @@ -0,0 +1,351 @@ +# EnsembleMetrics +*Back to main [[RosettaScripts|RosettaScripts]] page.* + +Page created Wed, 9 February 2022 by Vikram K. Mulligan, Flatiron Institute (vmulligan@flatironinstitute.org). + +[[_TOC_]] + +## 1. Description + +Just as [[SimpleMetrics]] measure some property of a pose, EnsembleMetrics measure some property of a group (or _ensemble_) of poses. They are designed to be used in two phases. In the _accumulation_ phase, an EnsembleMetric is applied to each pose in an ensemble in sequence, allowing it to store any relevant measurements from that pose that will later be needed to calculate properties of the ensemble. In the _reporting_ phase, the EnsembleMetric generates a report about the properties of the ensemble and writes this report to disk or to tracer. Following reporting, an EnsembleMetric may be _interrogated_ by such modules as the [[EnsembleFilter]], allowing retrieval of any floating-point values computed by the EnsembleMetric for filtering. Alternatively, the EnsembleMetric may be _reset_ for re-use (meaning that accumulated data, but not configuration settings, are wiped). + +##Available EnsembleMetrics + +EnsembleMetric | Description | MPI support? +-------------- | ----------- | ------------ +**[[CentralTendency]]** | Takes a [[real-valued SimpleMetric|SimpleMetrics]], applies it to each pose in an ensemble, and returns measures of central tendency (mean, median, mode) and other measures of the distribution (standard deviation, standard error, etc.). | YES +**[[PNear|PNearEnsembleMetric]]** | Based on a conformational ensemble, computes the propensity to favour a desired state or the lowest-energy state sampled. | YES + +## 2. Usage modes + +EnsembleMetrics have three intended usage modes in [[RosettaScripts]]: + +Mode | Setup | Accumulation Phase | Reporting Phase | Subsequent Interrogation | Subsequent Resetting +---- | ----- | ------------------ | --------------- | ------------------------ | -------------------- +Basic accumulator mode | Added to a protocol at point of accumulation. | The EnsembleMetric is applied to each pose that the RosettaScripts script handles, in sequence. | The EnsembleMetric produces its report at termination of the RosettaScripts application. This report covers all poses seen during this RosettaScripts run. | None. | None. +Internal generation mode | Provided with a ParsedProtocol for generating the ensemble of poses from the input pose, and a number to generate. Added to protocol at point where ensemble should be generated from pose at that point. | Accumulates information about each pose in the ensemble it generates. Poses are then discaded. | The report is provided immediately once the ensemble has been generated. The script then continues with the input pose. | After reporting. | On next nstruct (repeat) or next job. +Multiple pose mover mode | Set to use input from a mover that produces many outputs (a [[MultiplePoseMover]]). Placed in script after such a mover. | Collects data from each pose produced by previous mover. | Reports immediately after collecting data on all poses produced by previous mover. The script then continues on. | After reporting. | On next nstruct (repeat) or next job. + +### 2.1 Example of basic usage + +In this example, the input is a cyclic peptide (provided with the `-in:file:s` commandline option). This script perturbs the peptide backbone, relaxes the peptide, and then applies a [[CentralTendency EnsembleMetric|CentralTendency]] that in turn applies a [[TotalEnergyMetric]], measuring total score. At the end of execution (after repeat execution, a number of times set with the `-nstruct` commandline option), the EnsembleMetric produces a report about the mean, median, mode, etc. of the samples. + +```xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +### 2.2 Example of internal generation mode + +This example is similar to the example above, only this time, we load one or more cyclic peptides (provided with the `-in:file:s` or `-in:file:l` commandline options), generate a conformational ensemble for each peptide _in memory_, without writing all structures to disk, and perform ensemble analysis on that ensemble, filtering on the results with the [[EnsembleFilter]]. + +```xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +#### 2.2.1 Multi-threading + +When used in internal generation mode, the EnsembleMetric can generate members of the ensemble in [[parallel threads|Multithreading]]. This uses the [[RosettaThreadManager]], assigning work to available threads up to a user-specied maximum number to request. To set the maximum number of threads to request, use the `n_threads` option (where a setting of zero means to request all available threads). This functionality is only available in multi-threaded builds of Rosetta (built using `extras=cxx11thread` in the `scons` command), and requires that the total number of Rosetta threads be set at the command line using the `-multithreading:total_threads` commandline option. Note that an EnsembleMetric may be assigned fewer than the requested number of threads if other modules are using threads; at a minimum, it is guaranteed to be assigned the calling thread. **Note: this is a _highly_ experimental feature that can fail for many ensemble-generating protocols. When in doubt, it is safest to set `n_threads` to 1 (the default) for an EnsembleMetric.** + +### 2.3 Example of multiple pose mover mode + +The following example uses the [[BundleGridSampler]] mover to grid-sample helical bundle conformations parametrically. For each conformation sampled, the protocol then uses the [[Disulfidize]] mover to generate all possible disulfides joining the helices as an ensemble of poses. It then computes the median disulfide pair energy, and discards conformations for which this energy is above a cutoff. + +```xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +## 3. Interrogating EnsembleMetric floating-point values by name + +Each EnsembleMetric can return one or more floating-point values describing different features of the ensemble. Each of these has a name associated with it. + +### 3.1 From C++ or Python code + +From C++ (or Python) code, after an EnsembleMetric produces its final report, these values can be interrogated with the `get_metric_by_name()` method. To see all names offered by a particular EnsembleMetric, call `real_valued_metric_names()`: + +```C++ + // C++ pseudo-code: + + // Create an EnsembleMetric: + CentralTendencyEnsembleMetric my_ensemble_metric; + // Configure this EnsembleMetric here. This particular + // example would require a SimpleMetric to be passed to + // it, though in general the setup for EnsembleMetrics + // will vary from EnsembleMetric subclass to subclass. + + for( core::Size i=1; i<=nstruct; ++i ) { + // Generate a pose here. + // ... + + // Collect data from it: + my_ensemble_metric.apply( pose ); + } + + // Produce final report (to tracer or disk, + // depending on configuration): + my_ensemble_metric.produce_final_report(); + + // Get the names of floating point values + // that the EnsembleMetric has calculated: + utility::vector1< std::string > const value_names( + my_ensemble_metric.real_valued_metric_names() + ); + + // Confirm that "median" is a name of a value + // returned by this particular metric: + runtime_assert( value_names.has_value( "median" ) ); //This passes. + + // Get the median value from the ensemble: + core::Real const median_value( + my_ensemble_metric.get_metric_by_name( "median" ) + ); +``` + +### 3.2 Using filters + +In RosettaScripts (or in PyRosetta or even C++ code), when an EnsembleMetric is used in internal generator mode or multiple pose mover mode (_i.e._ it applies itself to an ensemble of poses that it either generates internally or receives from a previous mover) a subsequent [[EnsembleFilter]] may be used to interrogate a named value computed by the EnsembleMetric, and to cause the protocol to pass or fail depending on that property of the ensemble. + +Why would someone want to do this? One example would be if one wanted to write a script that would design a protein, generate for each design a conformational ensemble, and score the propensity to favour the designed conformation (_e.g._ with the planned [[PNear]] EnsembleMetric), then discard those designs that have poor propensity to favour the designed state based on the ensemble analysis. This would ensure that one could produce thousands or tens of thousands of designs in memory, analyze them all, and only write to disk the ones worth carrying forward. Variant patterns include generating initial designs using a low-cost initial design protocol, doing moderate-cost ensemble analysis, discarding poor designs with the EnsembleFilter, and refining those designs that pass the filter using higher-cost refinement protocols. Other similar usage patterns are possible. + +Note that if one simply wants the value produced by the EnsembleMetric to be recorded in the pose, the EnsembleFilter can be used for that purpose as well by setting `confidence="0"` (so that the filter never rejects anything, but only reports). At some point, a SimpleMetric may be written for that purpose. For more information, see the page for the [[EnsembleFilter]]. + +## 4. Note about running in MPI mode + +The [[Message Passing Interface (MPI)|MPI]] permits massively parallel execution of a Rosetta protocol. If an EnsembleMetric is used in basic mode (Section 2.1) using the [[MPI build|Build-Documentation]] of Rosetta, all poses seen _by all processes_ are considered part of the ensemble that is being analysed. At the end of the protocol, all of the instances of the EnsembleMetric on worker processes will report back to the director process with the measurements needed to allow the director process to perform the analysis on the whole ensemble. This can be convenient for rapidly analysing very large ensembles generated in memory across a large cluster, without needing to write thousands or millions of structuers to disk. This functionality is currently only available in the [[JD2]] version of the [[RosettaScripts]] application, and only when the [[MPIWorkPoolJobDistributor|JD2]] (the default MPI JD2 job distributor) or the MPIFileBufJobDistributor (the default MPI JD2 job distributor for use with Rosetta silent files) is used. Support for [[JD3|RosettaScripts-JD3]] is planned. + +Note that EnsembleMetrics that run in different MPI processes, and which generate ensembles internally using either a generating protocol (Section 2.2) or a multiple pose mover (Section 2.3), report immediately on the ensemble seen locally _in that process_. In this case, no information is shared between processes. + +As a final note, some EnsembleMetrics may not support MPI job collection. These should tell you so with a suitable error message at parse time (i.e. before you run an expensive protocol and try to collect in results). See the table of EnsembleMetrics for MPI support. + +## 5. See Also + +* [[SimpleMetrics]]: Measure a property of a single pose. +* [[Filters|Filters-RosettaScripts]]: Filter on a measured feature of a pose. +* [[EnsembleFilter]]: Filter on a property of an ensemble of poses. +* [[Movers|Movers-RosettaScripts]]: Modify a pose. +* [[I want to do x]]: Guide to choosing a Rosetta protocol. diff --git a/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/CentralTendency.md b/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/CentralTendency.md new file mode 100644 index 000000000..0e1d7c6b3 --- /dev/null +++ b/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/CentralTendency.md @@ -0,0 +1,41 @@ +# CentralTendency Ensemble Metric +*Back to [[EnsembleMetrics]] page.* +## CentralTendency Ensemble Metric + +[[_TOC_]] + +### Description + +The Central Tendency metric accepts as input a real-valued [[SimpleMetric|SimpleMetrics]]. It then applies it to each pose in an ensemble, collecting a series of values. At reporting time, the metric computes measures of central tendency (mean, median, and mode), plus other descriptive statistics about the distribution of the measured value over the ensemble (standard deviation, standard error, min, max, range). + +### Author and history + +Created Tuesday, 8 February 2022 by Vikram K. Mulligan, Center for Computational Biology, Flatiron Institute (vmulligan@flatironinstitute.org). This was the first [[EnsembleMetric|EnsembleMetrics]] implemented + +### Interface + +[[include:ensemble_metric_CentralTendency_type]] + +### Named values produced + +Measure | Name (used for the [[EnsembleFilter]]) | Description +--------|----------------------------------------|------------ +Mean | mean | The average of the values measured for the poses in the ensemble. +Median | median | When values measured from all of hte poses in the ensemble are listed in increasing order, this is the middle value. If the number of poses in the ensemble is even, the middle two values are averaged. +Mode | mode | The most frequently seen value in the values measured from the poses in the environment. If more than one value appears with equal frequency and this frequency is highest, the values are averaged. +Standard Deviation | stddev | Estimate of the standard deviation of the mean, defined as the sqrt( sum_i( S_i - mean )^2 / N ), where S_i is the ith sample, mean is the average of all the samples, and N is the number of samples. +Standard Error | stderr | Estimate of the standard error of the mean, defined by stddev / sqrt(N), where N is the number of samples. +Min | min | The minimum value seen. +Max | max | The maximum value seen. +Range | range | the largest value seen minus the smallest. + +#### Note about mode + +The mode of a set of floating-point numbers can be thrown off by floating-point error. For instance, two poses may have energies of -3.7641 kJ/mol, but the process of computing that energy may result in slightly different values at the 15th decimal point. This could prevent the filter from recognizing this is at the most frequent value. Mode is most useful as a metric when the "floating-point" values are actually integers (for instance, given a [[SimpleMetric|SimpleMetrics]] like the [[SelectedResidueCountMetric]], which returns integer counts). + +##See Also + +* [[SimpleMetrics]]: Available SimpleMetrics. +* [[EnsembleMetrics]]: Available EnsembleMetrics. +* [[PNear ensemble metric|PNearEnsembleMetric]]: An ensemble metric that computes propensity to favour a desired conformation given a conformational ensemble. +* [[I want to do x]]: Guide to choosing a tool in Rosetta. \ No newline at end of file diff --git a/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/PNearEnsembleMetric.md b/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/PNearEnsembleMetric.md new file mode 100644 index 000000000..6ef977938 --- /dev/null +++ b/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/PNearEnsembleMetric.md @@ -0,0 +1,67 @@ +# PNear Ensemble Metric +*Back to [[EnsembleMetrics]] page.* +## PNear Ensemble Metric + +[[_TOC_]] + +### Description + +PNear is a metric used to describe the propensity of a sequence to favour a particular conformational state. It was first described in Bhardwaj, Mulligan, Bahl _et al_. (2016) _Nature_ 538(7625):329-335, doi: 10.1038/nature19791. The PNear [[EnsembleMetric|EnsembleMetrics]] computes PNear given a conformational ensemble generated by some protocol. It can optionally compute the propensity to favour some desired state, provided with the `-in:file:native` commandline option or with the `native_file` option in RosettaScripts, or it can compute the propensity to favour the lowest-energy state observed in the ensemble. + +As with any ensemble metric, the analysis can be performed for an ensemble of structures on disk, an ensemble generated on the fly in memory (but never written to disk), or even a distributed ensemble sampled on many nodes across a large cluster via MPI (with no single node ever seeing all of the structures). + +### Author and history + +Created Thursday, 10 March 2022 by Vikram K. Mulligan, Center for Computational Biology, Flatiron Institute (vmulligan@flatironinstitute.org). + +### Details of the calculation + +PNear is defined as follows: + +![Expression defining PNear](/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/PNear_Eqn.png) + +In the above, _N_ is the number of structures in the ensemble, _Ei_ and _rmsdi_ are the energy and the RMSD to target conformation of the ith sample, respectively, and λ (lambda) and kBT are two parameters controlling the calculation of PNear. The parameter λ, measured in Angstroms, defines how close a structure has to be to the target conformation (either the user-provided native state or the lowest-energy state in the ensemble) in order for it to be considered "close enough" to count as contributing to high propensity to favour the target state. The parameter kBT, meaured in kcal/mol, is a Boltzmann temperature that determines how much each sampled conformation contributes to distribution of states as its energy rises. + +The value of PNear ranges from 0 to 1, with values close to 0 indicating that the molecule has very low propensity to favour the desired conformation and values close to 1 indicating that it has very high propensity to do so. It may be thought of as the Boltzmann probability of being near to the desired conformation, where "near" is defined fuzzily by a Gaussian function of RMSD with breadth λ. By defining this fuzzily rather than sharply, numerical instability on repeated sampling runs is avoided: small changes in the distribution of samples result in _small_ changes in PNear, but could result in _large_ changes in the values of earlier metrics that used hard cutoffs to determine whether a sampled state was "native-like" or not. + +### Interface + +[[include:ensemble_metric_PNear_type]] + +### Vector-valued metrics produced + +The PNear ensemble metric produces two vectors of outputs that can be accessed from C++ or Python by calling the `PNearEnsembleMetric::get_realvector_metric_value_by_name( std::string const & name, core::Size const index )` method. These are called "pnear_to_native" and "pnear_to_lowestE". There is one entry in the vector for every scoring function provided when configuring the PNear ensemble metric. (For ordinary use, when only one scoring function is provided, these are 1-vectors.) + +### Example usage + +The following script reads a series of PDB files or structures from a silent file (passed in with one of the `-in:file:s`, `-in:file:l` or `-in:file:silent` options), aligns each to a native structure (`inputs/native.pdb`), scores each with Rosetta's `ref2015` scoring function, and computes PNear to both the native structure and to the lowest-energy structure in the ensemble of input structuers. + + +```xml + + + + + + + + + + + + + + +``` + +##See Also + +* [[EnsembleMetrics]]: Available EnsembleMetrics. +* [[SimpleMetrics]]: Available SimpleMetrics. +* [[CentralTendency ensemble metric|CentralTendency]]: An ensemble metric that computs mean, median, mode, etc. of values produced by a [[SimpleMetric|SimpleMetrics]]. +* [[I want to do x]]: Guide to choosing a tool in Rosetta. \ No newline at end of file diff --git a/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/PNear_Eqn.png b/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/PNear_Eqn.png new file mode 100644 index 000000000..f946fa265 Binary files /dev/null and b/scripting_documentation/RosettaScripts/EnsembleMetrics/ensemble_metric_pages/PNear_Eqn.png differ diff --git a/scripting_documentation/RosettaScripts/FeaturesReporter/_Sidebar.md b/scripting_documentation/RosettaScripts/FeaturesReporter/_Sidebar.md index de27f22ed..4d334d69b 100644 --- a/scripting_documentation/RosettaScripts/FeaturesReporter/_Sidebar.md +++ b/scripting_documentation/RosettaScripts/FeaturesReporter/_Sidebar.md @@ -20,6 +20,8 @@ * [[Simple Metrics | SimpleMetrics]] + * [[Ensemble Metrics|EnsembleMetrics]] + * [[Filters|Filters-RosettaScripts]] * [[FeaturesReporters|Features-reporter-overview]] diff --git a/scripting_documentation/RosettaScripts/FeaturesReporter/features_reporters/_Sidebar.md b/scripting_documentation/RosettaScripts/FeaturesReporter/features_reporters/_Sidebar.md index 4faf9cccc..209569d19 100644 --- a/scripting_documentation/RosettaScripts/FeaturesReporter/features_reporters/_Sidebar.md +++ b/scripting_documentation/RosettaScripts/FeaturesReporter/features_reporters/_Sidebar.md @@ -14,6 +14,10 @@ * [[Filters|Filters-RosettaScripts]] + * [[Simple Metrics|SimpleMetrics]] + + * [[Ensemble Metrics|EnsembleMetrics]] + * [[Residue Selectors|ResidueSelectors]] * [[PackerPalettes|PackerPalette]] diff --git a/scripting_documentation/RosettaScripts/FeaturesReporter/rscripts/_Sidebar.md b/scripting_documentation/RosettaScripts/FeaturesReporter/rscripts/_Sidebar.md index 94af5a392..3800286b4 100644 --- a/scripting_documentation/RosettaScripts/FeaturesReporter/rscripts/_Sidebar.md +++ b/scripting_documentation/RosettaScripts/FeaturesReporter/rscripts/_Sidebar.md @@ -14,6 +14,10 @@ * [[Residue Selectors|ResidueSelectors]] + * [[Simple Metrics|SimpleMetrics]] + + * [[Ensemble Metrics|EnsembleMetrics]] + * [[PackerPalettes|PackerPalette]] * [[Filters|Filters-RosettaScripts]] diff --git a/scripting_documentation/RosettaScripts/Filters/Filters-RosettaScripts.md b/scripting_documentation/RosettaScripts/Filters/Filters-RosettaScripts.md index 5beb4a849..8518db1fe 100644 --- a/scripting_documentation/RosettaScripts/Filters/Filters-RosettaScripts.md +++ b/scripting_documentation/RosettaScripts/Filters/Filters-RosettaScripts.md @@ -37,6 +37,7 @@ Filter | Description **[[CompoundStatement|CompoundStatementFilter]]** | Uses previously defined filters with logical operations to construct a compound filter. **[[CombinedValue|CombinedValueFilter]]** | Weighted sum of multiple filters. **[[CalculatorFilter]]** | Combine multiple filters with a mathematical expression. +**[[EnsembleFilter]]** | Filter based, not on a property of a single pose, but on a property of an _ensemble_ of many poses. **[[ReplicateFilter]]** | Repeat a filter multiple times and average. **[[Boltzmann|BoltzmannFilter]]** | Boltzmann weighted sum of positive/negative filters. **[[MoveBeforeFilter]]** | Apply a mover before applying the filter. diff --git a/scripting_documentation/RosettaScripts/Filters/_Sidebar.md b/scripting_documentation/RosettaScripts/Filters/_Sidebar.md index b37eb316e..3752c97cf 100644 --- a/scripting_documentation/RosettaScripts/Filters/_Sidebar.md +++ b/scripting_documentation/RosettaScripts/Filters/_Sidebar.md @@ -20,6 +20,8 @@ * [[Simple Metrics | SimpleMetrics]] + * [[Ensemble Metrics | EnsembleMetrics]] + * [[Filters|Filters-RosettaScripts]] * [[FeaturesReporters|Features-reporter-overview]] diff --git a/scripting_documentation/RosettaScripts/Filters/AlignmentAAFinderFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/AlignmentAAFinderFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/AlignmentAAFinderFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/AlignmentAAFinderFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/AlignmentGapInserterFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/AlignmentGapInserterFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/AlignmentGapInserterFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/AlignmentGapInserterFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/ChainBreakFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/ChainBreakFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/ChainBreakFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/ChainBreakFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/filter_pages/EnsembleFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/EnsembleFilter.md new file mode 100644 index 000000000..7dfbda089 --- /dev/null +++ b/scripting_documentation/RosettaScripts/Filters/filter_pages/EnsembleFilter.md @@ -0,0 +1,132 @@ +# EnsembleFilter +*Back to [[SimpleMetrics]] page.* +*Back to [[Filters | Filters-RosettaScripts]] page.* +## EnsembleFilter + +Created by Vikram K. Mulligan (vmulligan@flatironinstitute.org) on 10 February 2022. + +[[_TOC_]] + +### Description + +This filter takes as input an [[EnsembleMetric|EnsembleMetrics]] that has been used to evaluate some set of properties of an ensemble of filters, retrives a named floating-point value from the metric, and filters based on whether that value is greater than, equal to, or less than some threshold. (Note that [[EnsembleMetrics]] evaluate a property of a collection or _ensemble_ poses, not of a single pose. This makes this filter unusual: where most discard a trajectory based on the state of a single pose, this can discard a trajectory based on the state of large ensemble of poses -- for example, based on many sampled conformatinos of a single design.) + + +### Options + +[[include:filter_SimpleMetricFilter_type]] + +### Example: + +In this example, we load one or more cyclic peptides (provided with the `-in:file:s` or `-in:file:l` commandline options), generate a conformational ensemble of slightly perturbed conformations for each peptide _in memory_, without writing all structures to disk, and perform ensemble analysis on that ensemble with the [[CentralTendency EnsembleMetric|CentralTendency]], filtering on the results with the EnsembleFilter. Only those peptides that have low-energy ensembles of perturbed conformations pass the filter. + +```xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +### See also + +* [[EnsembleMetrics]]: Available SimpleMetrics +* [[SimpleMetrics]]: Available SimpleMetrics +* [[SimpleMetricFilter]]: Filter on an arbitrary SimpleMetric +* [[Movers|Movers-RosettaScripts]]: Available Movers +* [[I want to do x]]: Guide to choosing a Rosetta protocol. \ No newline at end of file diff --git a/scripting_documentation/RosettaScripts/Filters/FragQualFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/FragQualFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/FragQualFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/FragQualFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/FragmentScoreFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/FragmentScoreFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/FragmentScoreFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/FragmentScoreFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/HelixHelixAngleFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/HelixHelixAngleFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/HelixHelixAngleFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/HelixHelixAngleFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/HolesFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/HolesFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/HolesFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/HolesFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/LongestContinuousApolarSegmentFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/LongestContinuousApolarSegmentFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/LongestContinuousApolarSegmentFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/LongestContinuousApolarSegmentFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/MPSpanAngleFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/MPSpanAngleFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/MPSpanAngleFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/MPSpanAngleFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/SequenceDistanceFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/SequenceDistanceFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/SequenceDistanceFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/SequenceDistanceFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/SpanTopologyMatchPoseFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/SpanTopologyMatchPoseFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/SpanTopologyMatchPoseFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/SpanTopologyMatchPoseFilter.md diff --git a/scripting_documentation/RosettaScripts/Filters/TMsAACompFilter.md b/scripting_documentation/RosettaScripts/Filters/filter_pages/TMsAACompFilter.md similarity index 100% rename from scripting_documentation/RosettaScripts/Filters/TMsAACompFilter.md rename to scripting_documentation/RosettaScripts/Filters/filter_pages/TMsAACompFilter.md diff --git a/scripting_documentation/RosettaScripts/Movers/_Sidebar.md b/scripting_documentation/RosettaScripts/Movers/_Sidebar.md index b37eb316e..dd495fff9 100644 --- a/scripting_documentation/RosettaScripts/Movers/_Sidebar.md +++ b/scripting_documentation/RosettaScripts/Movers/_Sidebar.md @@ -20,6 +20,8 @@ * [[Simple Metrics | SimpleMetrics]] + * [[Ensemble Metrics|EnsembleMetrics]] + * [[Filters|Filters-RosettaScripts]] * [[FeaturesReporters|Features-reporter-overview]] diff --git a/scripting_documentation/RosettaScripts/RosettaScripts.md b/scripting_documentation/RosettaScripts/RosettaScripts.md index 8162b50ea..87c20cb24 100644 --- a/scripting_documentation/RosettaScripts/RosettaScripts.md +++ b/scripting_documentation/RosettaScripts/RosettaScripts.md @@ -19,6 +19,7 @@ Fleishman SJ, Leaver-Fay A, Corn JE, Strauch EM, Khare SD, et al. (2011) Rosetta - [[JumpSelectors |JumpSelectors]] - [[PackerPalettes|PackerPalette]] - [[SimpleMetrics]] +- [[EnsembleMetrics]] --------------------- diff --git a/scripting_documentation/RosettaScripts/SimpleMetrics/_Sidebar.md b/scripting_documentation/RosettaScripts/SimpleMetrics/_Sidebar.md index 19e142d78..e5e662775 100644 --- a/scripting_documentation/RosettaScripts/SimpleMetrics/_Sidebar.md +++ b/scripting_documentation/RosettaScripts/SimpleMetrics/_Sidebar.md @@ -14,6 +14,10 @@ * [[Residue Selectors|ResidueSelectors]] + * [[Simple Metrics|SimpleMetrics]] + + * [[Ensemble Metrics|EnsembleMetrics]] + * [[PackerPalettes|PackerPalette]] * [[Task Operations|TaskOperations-RosettaScripts]] diff --git a/scripting_documentation/RosettaScripts/TaskOperations/_Sidebar.md b/scripting_documentation/RosettaScripts/TaskOperations/_Sidebar.md index f41e9d5c9..d2c0e52c8 100644 --- a/scripting_documentation/RosettaScripts/TaskOperations/_Sidebar.md +++ b/scripting_documentation/RosettaScripts/TaskOperations/_Sidebar.md @@ -19,6 +19,8 @@ * [[Task Operations|TaskOperations-RosettaScripts]] * [[Simple Metrics | SimpleMetrics]] + + * [[Ensemble Metrics|EnsembleMetrics]] * [[Filters|Filters-RosettaScripts]] diff --git a/scripting_documentation/RosettaScripts/_Sidebar.md b/scripting_documentation/RosettaScripts/_Sidebar.md index b217aad09..003412fb8 100644 --- a/scripting_documentation/RosettaScripts/_Sidebar.md +++ b/scripting_documentation/RosettaScripts/_Sidebar.md @@ -22,6 +22,8 @@ * [[Simple Metrics | SimpleMetrics]] + * [[Ensemble Metrics|EnsembleMetrics]] + * [[Filters|Filters-RosettaScripts]] * [[FeaturesReporters|Features-reporter-overview]] diff --git a/scripting_documentation/RosettaScripts/composite_protocols/_Sidebar.md b/scripting_documentation/RosettaScripts/composite_protocols/_Sidebar.md index 33d6bcd71..35255fec7 100644 --- a/scripting_documentation/RosettaScripts/composite_protocols/_Sidebar.md +++ b/scripting_documentation/RosettaScripts/composite_protocols/_Sidebar.md @@ -20,6 +20,8 @@ * [[Simple Metrics | SimpleMetrics]] + * [[Ensemble Metrics|EnsembleMetrics]] + * [[Filters|Filters-RosettaScripts]] * [[Features Reporters|Features-reporter-overview]] diff --git a/scripting_documentation/RosettaScripts/xsd/ensemble_metric_CentralTendency_type.md b/scripting_documentation/RosettaScripts/xsd/ensemble_metric_CentralTendency_type.md new file mode 100644 index 000000000..6e5c2d553 --- /dev/null +++ b/scripting_documentation/RosettaScripts/xsd/ensemble_metric_CentralTendency_type.md @@ -0,0 +1,33 @@ + + +_Autogenerated Tag Syntax Documentation:_ + +--- +An ensemble metric that takes a real-valued simple metric, applies it to all poses in an ensemble, and calculates measures of central tendency (mean, median, mode) and other statistics about the distribution (standard deviation, standard error of the mean, min, max, range, etc.). Values that this ensemble metric returns are referred to in scripts as: mean, median, mode, stddev, stderr, min, max, and range. + +References and author information for the CentralTendency ensemble metric: + +CentralTendencyEnsembleMetric SimpleMetric's author(s): +Vikram K. Mulligan, Systems Biology group, Center for Computational Biology, Flatiron Institute [vmulligan@flatironinstitute.org] (Created the ensemble metric framework and wote the CentralTendency ensemble metric.) + +```xml + +``` + +- **label_prefix**: If provided, this prefix is prepended to the label for this ensemble metric (with an underscore after the prefix and before the ensemble metric name). +- **label_suffix**: If provided, this suffix is appended to the label for this ensemble metric (with an underscore after the ensemble metric name and before the suffix). +- **output_mode**: The output mode for reports from this ensemble metric. Default is 'tracer'. Allowed modes are: 'tracer', 'tracer_and_file', or 'file'. +- **output_filename**: The file to which the ensemble metric report will be written if output mode is 'tracer_and_file' or 'file'. Note that this filename will have the job name and number prepended so that each report is unique. +- **ensemble_generating_protocol**: An optional ParsedProtocol or other mover for generating an ensemble from the current pose. This protocol will be applied repeatedly (ensemble_generating_protocol_repeats times) to generate the ensemble of structures. Each generated pose will be measured by this metric, then discarded. The ensemble properties are then reported. If not provided, the current pose is measured and the report will be produced later (e.g. at termination with the JD2 rosetta_scripts application). +- **ensemble_generating_protocol_repeats**: The number of times that the ensemble_generating_protocol is applied. This is the maximum number of structures in the ensemble (though the actual number may be smaller if the protocol contains filters or movers that can fail for some attempts). Only used if an ensemble-generating protocol is provided with the ensemble_generating_protocol option. Defaults to 1. +- **n_threads**: The number of threads to request for generating ensembles in parallel. This is only used in multi-threaded compilations of Rosetta (compiled with extras=cxx11thread), and only when an ensemble-generating protocol is provided with the ensemble_generating_protocol option. A value of 0 means to use all available threads. In single-threaded builds, this must be set to 0 or 1. Defaults to 1. NOTE THAT MULTI-THREADING IS HIGHLY EXPERIMENTAL AND LIKELY TO FAIL FOR MANY ENSEMBLE-GENERATING PROTOCOLS. When in doubt, leave this set to 1. +- **use_additional_output_from_last_mover**: If true, this ensemble metric will use the additional output from the previous pose (assuming the previous pose generates multiple outputs) as the ensemble, analysing it and producing a report immediately. If false, then it will behave normally. False by default. +- **real_valued_metric**: (REQUIRED) The name of a real-valued simple metric defined previously. Required input. + +--- diff --git a/scripting_documentation/RosettaScripts/xsd/ensemble_metric_PNear_type.md b/scripting_documentation/RosettaScripts/xsd/ensemble_metric_PNear_type.md new file mode 100644 index 000000000..9946ce070 --- /dev/null +++ b/scripting_documentation/RosettaScripts/xsd/ensemble_metric_PNear_type.md @@ -0,0 +1,48 @@ + + +_Autogenerated Tag Syntax Documentation:_ + +--- +An ensemble metric that computes PNear, an estimate of the Boltzmann probability that a molecule's conformation is close to a desired conformation, based on an ensemble of sampled conformations and their energies. PNear was described in Bhardwaj, Bahl, Mulligan et al. (2016). Accurate de novo design of hyperstable constrained peptides. Nature 538(7625):329-335. doi: 10.1038/nature19791. Given N samples, it is defined by: PNear = sum_i( exp( -(R_i/lambda)^2) exp(-E_i/(kbt)) ) / sum_j( exp(-E_j/(kbt)) ), where E_i and E_j are the energies of the ith and jth samples, respectively, and R_i is the RMSD of the ith sample to the reference state. + +```xml + +``` + +- **label_prefix**: If provided, this prefix is prepended to the label for this ensemble metric (with an underscore after the prefix and before the ensemble metric name). +- **label_suffix**: If provided, this suffix is appended to the label for this ensemble metric (with an underscore after the ensemble metric name and before the suffix). +- **output_mode**: The output mode for reports from this ensemble metric. Default is 'tracer'. Allowed modes are: 'tracer', 'tracer_and_file', or 'file'. +- **output_filename**: The file to which the ensemble metric report will be written if output mode is 'tracer_and_file' or 'file'. Note that this filename will have the job name and number prepended so that each report is unique. +- **ensemble_generating_protocol**: An optional ParsedProtocol or other mover for generating an ensemble from the current pose. This protocol will be applied repeatedly (ensemble_generating_protocol_repeats times) to generate the ensemble of structures. Each generated pose will be measured by this metric, then discarded. The ensemble properties are then reported. If not provided, the current pose is measured and the report will be produced later (e.g. at termination with the JD2 rosetta_scripts application). +- **ensemble_generating_protocol_repeats**: The number of times that the ensemble_generating_protocol is applied. This is the maximum number of structures in the ensemble (though the actual number may be smaller if the protocol contains filters or movers that can fail for some attempts). Only used if an ensemble-generating protocol is provided with the ensemble_generating_protocol option. Defaults to 1. +- **n_threads**: The number of threads to request for generating ensembles in parallel. This is only used in multi-threaded compilations of Rosetta (compiled with extras=cxx11thread), and only when an ensemble-generating protocol is provided with the ensemble_generating_protocol option. A value of 0 means to use all available threads. In single-threaded builds, this must be set to 0 or 1. Defaults to 1. NOTE THAT MULTI-THREADING IS HIGHLY EXPERIMENTAL AND LIKELY TO FAIL FOR MANY ENSEMBLE-GENERATING PROTOCOLS. When in doubt, leave this set to 1. +- **use_additional_output_from_last_mover**: If true, this ensemble metric will use the additional output from the previous pose (assuming the previous pose generates multiple outputs) as the ensemble, analysing it and producing a report immediately. If false, then it will behave normally. False by default. +- **compute_pnear_to_native**: Should PNear be computed using the structure provided with the -in:file:natie option as the reference state? Defaults to true. If multiple scorefunctions are provided, one PNear value is computed using each in turn. +- **compute_pnear_to_lowestE**: Should PNear be computed using the structure provided with the -in:file:native option as the reference state? Defaults to false. If multiple scorefunctions are provided, one PNear value is computed using each in turn. +- **scorefxns**: (REQUIRED) A comma-separated list of previously-defined scoring functions. At least one must be provided. +- **kbt**: The Boltzmann temperature, used to compute Boltzmann probabilities of conformational states. Defaults to 0.62 kcal/mol, which is approximately physiological temperature. Must be positive. +- **lambda**: The breadth of the Gaussian, in Angstroms, used to determine whether the RMSD to the reference state is small enough for a sampled state to be considered 'near-native' or not. The default value, 2.5 A, is appropriate for small proteins. For peptides, a value of 1.5 A to 2.0 A is more appropriate. Very large proteins or protein complexes might warrant a larger value. Must be positive. +- **superimpose_for_rmsd**: Set whether the poses should be superimposed (aligned) for the RMSD calculation. The default is true, which is appropriate for PNear calculations for folding. For PNear calculations for docking, users may want to set this to false (so that docked configuratinos that preserve backbone conformation but involve a different rigid-body transform than native register as high-RMSD states). +- **use_backbone_in_rmsd**: Set whether backbone heavyatoms should be used in computing RMSD values. True by default. If no residue selector is provided with the backbone_residue_selector option, all residues in the pose are used if this option is set to true. +- **use_CB_in_rmsd**: If this option is true, then the first sidechain heavyatom (CB in canonical amino acids) is included with the backbone heavyatoms when computing heavyatom RMSD. Requires use_backbone_in_rmsd to be set to true. False by default. +- **use_sidechain_in_rmsd**: Set whether sidechain heavyatoms should be used in computing RMSD values. False by default. If no residue selector is provided with the sidehchain_residue_selector option, all residues in the pose are used if this option is set to true. +- **native_file**: A PDB, CIF, or other structure file for the native pose. Only used if compute_pnear_to_native is true. If this is left as an empty string (the default), then the file passed on the commandline with -in:file:native is used. +- **native_preparation_protocol**: A mover applied to the native pose on load. (This can be useful for setting up cyclic geometry, for example.) Not used if not provided. +- **backbone_residue_selector**: An optional residue selector to select the residues whose backbone heavyatoms (and first sidechain heavyatoms, if use_CB_in_rmsd is set to true) are used when computing RMSD values. If this is not provided, all positions are used. This option does nothing unless use_backbone_in_rmsd is set to true. The name of a previously declared residue selector or a logical expression of AND, NOT (!), OR, parentheses, and the names of previously declared residue selectors. Any capitalization of AND, NOT, and OR is accepted. An exclamation mark can be used instead of NOT. Boolean operators have their traditional priorities: NOT then AND then OR. For example, if selectors s1, s2, and s3 have been declared, you could write: 's1 or s2 and not s3' which would select a particular residue if that residue were selected by s1 or if it were selected by s2 but not by s3. +- **sidechain_residue_selector**: An optional residue selector to select the residues whose sidechain heavyatoms are used when computing RMSD values. If this is not provided, all positions are used. This option does nothing unless use_sidechain_in_rmsd is set to true. The name of a previously declared residue selector or a logical expression of AND, NOT (!), OR, parentheses, and the names of previously declared residue selectors. Any capitalization of AND, NOT, and OR is accepted. An exclamation mark can be used instead of NOT. Boolean operators have their traditional priorities: NOT then AND then OR. For example, if selectors s1, s2, and s3 have been declared, you could write: 's1 or s2 and not s3' which would select a particular residue if that residue were selected by s1 or if it were selected by s2 but not by s3. + +--- diff --git a/scripting_documentation/RosettaScripts/xsd/filter_EnsembleFilter_type.md b/scripting_documentation/RosettaScripts/xsd/filter_EnsembleFilter_type.md new file mode 100644 index 000000000..394234913 --- /dev/null +++ b/scripting_documentation/RosettaScripts/xsd/filter_EnsembleFilter_type.md @@ -0,0 +1,26 @@ + + +_Autogenerated Tag Syntax Documentation:_ + +--- +A filter that filters based on some named float-valued property measured by an EnsembleMetric. Note that the value produced by the EnsembleMetric is based on an ensemble generated earlier in the protocol, presumably from the pose on which we are currently filtering. + +References and author information for the EnsembleFilter filter: + +EnsembleFilter Filter's author(s): +Vikram K. Mulligan, Systems Biology Group, Center for Computational Biology, Flatiron Institute. [vmulligan@flatironinstitute.org] (Wrote the EnsembleFilter.) + +```xml + +``` + +- **ensemble_metric**: (REQUIRED) A previously-defined EnsembleMetric that produces at least one floating-point value. This filter will filter a pose based on that value. +- **named_value**: (REQUIRED) A named floating-point value produced by the EnsembleMetric, on which this filter will filter. +- **threshold**: The threshold for rejecting a pose. +- **filter_acceptance_mode**: The criterion for ACCEPTING a pose. For instance, if the value returned by the ensemble metric is greater than the threshold, and the mode is 'less_than_or_equal' (the default mode), then the pose is rejected. Allowed modes are: 'greater_than', 'less_than', 'greater_than_or_equal', 'less_than_or_equal', 'equal', and 'not_equal'. +- **confidence**: Probability that the pose will be filtered out if it does not pass this Filter + +--- diff --git a/scripting_documentation/RosettaScripts/xsd/filter_FragmentScoreFilter_type.md b/scripting_documentation/RosettaScripts/xsd/filter_FragmentScoreFilter_type.md index cf17faf3b..288cf53a7 100644 --- a/scripting_documentation/RosettaScripts/xsd/filter_FragmentScoreFilter_type.md +++ b/scripting_documentation/RosettaScripts/xsd/filter_FragmentScoreFilter_type.md @@ -13,7 +13,7 @@ Filter based on any score that can be calculated in fragment_picker. outputs_name="(pose &string;)" csblast="(&string;)" blast_pgp="(&string;)" placeholder_seqs="(&string;)" sparks-x="(&string;)" sparks-x_query="(&string;)" psipred="(&string;)" - vall_path="(/scratch/benchmark/W.hojo-1/rosetta.Hojo-1/master/main/database//sampling/vall.jul19.2011.gz &string;)" + vall_path="(/home/vikram/rosetta_devcopy/Rosetta/main/database//sampling/vall.jul19.2011.gz &string;)" frags_scoring_config="(&string;)" n_frags="(200 &non_negative_integer;)" n_candidates="(1000 &non_negative_integer;)" print_to_pdb="(false &xs:boolean;)" diff --git a/scripting_documentation/RosettaScripts/xsd/mover_ParsedProtocol_type.md b/scripting_documentation/RosettaScripts/xsd/mover_ParsedProtocol_type.md index 2a4c1874f..5284f7d94 100644 --- a/scripting_documentation/RosettaScripts/xsd/mover_ParsedProtocol_type.md +++ b/scripting_documentation/RosettaScripts/xsd/mover_ParsedProtocol_type.md @@ -11,8 +11,8 @@ This is a special mover that allows making a single compound mover and filter ve apply_probability="(ℜ)" resume_support="(false &bool;)" > + ensemble_metrics="(&string;)" apply_probability="(ℜ)" + report_at_end="(true &bool;)" never_rerun_filter="(false &bool;)" /> @@ -33,6 +33,7 @@ Subtag **Add**: The steps to be applied. - **filter**: The filter whose execution is desired - **metrics**: A comma-separated list of metrics to run at this point. - **labels**: A comma-separated list of labels to use for the provided metrics in the output. If empty/missing, use the metric names from the metrics setting. If '-', use the metric's default. +- **ensemble_metrics**: A comma-separated list of ensemble metrics to add at this point. Ensemble metrics will collect information about the pose at this point, and will later report statistics about the ensemble of poses that they have seen. - **apply_probability**: by default equal probability for all tags - **report_at_end**: Report filter value via filter re-evaluation on final pose after conclusion of protocol. Otherwise report filter value as evaluated mid-protocol. - **never_rerun_filter**: Never run this filter after the original apply-time run. Use this option to avoid expensive re-runs when reporting