Skip to content

evolbeginner/phyloHessianWrapper

Repository files navigation

PhyloHessianWrapper

A computational tool for phylogenetic Hessian matrix estimation, enabling divergence time inference under (complex) amino acid substitution models with MCMCtree's approximate likelihood method (dos Reis et al., 2017).

Table of Contents

Installation

ruby additional_scripts/check_dependency.rb

Dependencies

Ensure these tools are installed:

  • Julia (>=1.6)
  • Ruby (≥ 2.7)
  • R (≥ 4.0)
  • IQ-Tree2 (≥ 2.0) or PhyML (≥ 3.1)

Quick-start

  1. Preparation

Get the species tree from MCMCTree or Codeml. This can be done by running MCMCtree with usedata = 0 in mcmctree.ctl then run sed !4d out.BV > ref.tre. This step is important as phyloHessian uses R package phytools to get the order of tips which are generally different from that used by PAML.

In the given example, ref.tre can be obtained by the following bash command where example/dating/ori/combined/in.BV is the in.BV file generated by running Codeml or MCMCtree with the default settings.

sed '4!d' example/dating/ori/combined/in.BV
  1. Run the following (with LG+G model and compare it with the result of Codeml)

Enter the folder example/ and run phyloHessianWrapper.rb.

cd example/;

ruby ../phyloHessianWrapper.rb \
  -s sim/alignment/combined.fas \
  --reftree ref.tre \
  -m LG+G \
  --outdir LG+G_ph \
  --force \
  --cpu 10
  1. Check results (optional)

Compare the lnL of the tree calculated by phylohessian from LG+G_ph/inBV/info and that calculated by IQ-Tree (LG+G_ph/iqtree/iqtree.log). They should be identical. Also, confirm that the result is the same as that in LG+G_ph-ref/.

  1. run MCMCTree

Copy the file LG+G_ph/inBV/in.BV to the folder for MCMCTree dating/LG+G/combined/. Make sure that usedata = 2 in.BV 3 is specified in the control file mcmctree.ctl (for an example, see example/dating/LG+G/combined/mcmctree.ctl). Then run in Bash

mcmctree mcmctree.ctl

Then compare the results with those in the folder ori/ which uses Codeml to calculate the Hessian. Don't they look very similar?

  1. Run MCMCTree with a more complex model say LG+C60+G+I.

Arguments

Required arguments

  • -s <value>: Sequence alignment file in Fasta format.

    • Options: Path to a Fasta file.
    • Default: None.
  • -t|--tree <value>: An unrooted treefile.

    • Options: Path to a treefile.
    • Default: None.
  • -m|--model <value>: An unrooted treefile.

    • Options: Substitution model.
    • Default: None.
  • --outdir <value>: Output folder.

    • Options: Path to a directory.
    • Default: None.
  • -r|--reftree|--ref_tree|--ref_tre|--reftree <value>: reference tree output from MCMCtree. This ensures the order of the gradient and Hessian are compatible with MCMCtree.

    • Options: can be generated by sed '4!d' in.BV (see the example above).
    • Defauly: None.

Optional arguments

  • --hessian_type

    • Options: STK2004 (outer product of scores estimator [OPS], default in CODEML) or fd (calculation on the 2nd-order derivative directly, much slower)
    • Default: STK2004 (OPS)
  • --cpu <value>: Number of threads to use.

    • Default: 1
  • --phylo_prog <value>: Program used for phylogenetic reconstruction (estimation of branch lengths, topology is fixed for MCMCtree).

    • Options: iqtree or phyml.
    • Default: iqtree
  • --pmsf: Perform a PMSF approximation (only valid for mixture models).

    • Options: Can also be specified by -m LG+C10+PMSF. See Wang et al., 2018 for details.
    • Default: None
  • --force: Remove the outdir if it exists.

    • Options: (Flag - presence enables the functionality). Presence of the flag triggers the removal.
    • Default: The outdir is not removed.
  • --no_mwopt: Disable optimization of mixture weights (not recommended).

    • Options: if specified --mwopt for IQ-Tree will be turned off.
    • Default: Mixture weight optimization is enabled.

Outputs

  • julia/ some information of the tree structure
  • iqtree/ or phyml output of tree reconstruction
  • bl/ some information about the branch lengths
  • ph/inBV/ the folder containing the most important outputs
    • info the lnL of the tree evaluated at the MLEs (maximum-likelihood estimates) of all parameters
    • in.BV the file that contains the MLEs of branch lengths, gradients, and Hessian.
  • execution_{TIME}.log the log file

How to complete molecular dating using MCMCtree

  1. Copy the file into your working directory
    Command: cp ph/inBV/in.BV .
  2. Edit the control file
    File: mcmctree.ctl Set: usedata = 3 in.BV
  3. Run the analysis
    Command: mcmctree mcmctree.ctl

Notes:

  • Step 1 assumes your current directory is where you want to run mcmctree. Adjust paths if needed.
  • Preserve spacing in the control file to match mcmctree’s expected format.
  • See also the very nicely written tutorial dos Reis, et al. 2017

More_examples

ruby ../phyloHessianWrapper.rb -s sim/alignment/combined.fas --reftree ref.tre -m LG+C60+G+PMSF --outdir outdir --force --cpu 10
  • LG+C60+G without --mwopt (use the default weights for each mixtures)
ruby ../phyloHessianWrapper.rb -s sim/alignment/combined.fas --reftree ref.tre -m LG+C60+G --outdir outdir --force --cpu 10 --no_mwopt
  • EX2+G with PhyML (need to have phyml-mixSS installed)
ruby ../phyloHessianWrapper.rb -s sim/alignment/combined.fas --reftree ref.tre -m EX2+G --outdir outdir --force --cpu 10 --phylo_prog phyml

Literature-cited

You might want to cite the following in case you use phyloHessian in case they are involved in your analysis.

And tools (potentially) involved

  • Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, Lanfear R. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Molecular biology and evolution. 2020 May 1;37(5):1530-4.
  • Revell LJ. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ. 2024 Jan 5;12:e16505.

Others

  • Wang HC, Minh BQ, Susko E, Roger AJ. Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Systematic biology. 2018 Mar 1;67(2):216-35.
  • dos Reis M, Álvarez-Carretero S, Yang Z. MCMCTree tutorials. https://gensoft.pasteur.fr/docs/paml/4.9j/MCMCtree.Tutorials.pdf. 2017 Apr 28

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published