A computational tool for phylogenetic Hessian matrix estimation, enabling divergence time inference under (complex) amino acid substitution models with MCMCtree's approximate likelihood method (dos Reis et al., 2017).
ruby additional_scripts/check_dependency.rbEnsure these tools are installed:
- Julia (>=1.6)
- Ruby (≥ 2.7)
- R (≥ 4.0)
- IQ-Tree2 (≥ 2.0) or PhyML (≥ 3.1)
- Preparation
Get the species tree from MCMCTree or Codeml. This can be done by running MCMCtree with usedata = 0 in mcmctree.ctl then run sed !4d out.BV > ref.tre. This step is important as phyloHessian uses R package phytools to get the order of tips which are generally different from that used by PAML.
In the given example, ref.tre can be obtained by the following bash command where example/dating/ori/combined/in.BV is the in.BV file generated by running Codeml or MCMCtree with the default settings.
sed '4!d' example/dating/ori/combined/in.BV- Run the following (with LG+G model and compare it with the result of
Codeml)
Enter the folder example/ and run phyloHessianWrapper.rb.
cd example/;
ruby ../phyloHessianWrapper.rb \
-s sim/alignment/combined.fas \
--reftree ref.tre \
-m LG+G \
--outdir LG+G_ph \
--force \
--cpu 10- Check results (optional)
Compare the lnL of the tree calculated by phylohessian from LG+G_ph/inBV/info and that calculated by IQ-Tree (LG+G_ph/iqtree/iqtree.log). They should be identical. Also, confirm that the result is the same as that in LG+G_ph-ref/.
- run MCMCTree
Copy the file LG+G_ph/inBV/in.BV to the folder for MCMCTree dating/LG+G/combined/. Make sure that usedata = 2 in.BV 3 is specified in the control file mcmctree.ctl (for an example, see example/dating/LG+G/combined/mcmctree.ctl). Then run in Bash
mcmctree mcmctree.ctlThen compare the results with those in the folder ori/ which uses Codeml to calculate the Hessian. Don't they look very similar?
- Run MCMCTree with a more complex model say LG+C60+G+I.
Required arguments
-
-s <value>: Sequence alignment file in Fasta format.- Options: Path to a Fasta file.
- Default: None.
-
-t|--tree <value>: An unrooted treefile.- Options: Path to a treefile.
- Default: None.
-
-m|--model <value>: An unrooted treefile.- Options: Substitution model.
- Default: None.
-
--outdir <value>: Output folder.- Options: Path to a directory.
- Default: None.
-
-r|--reftree|--ref_tree|--ref_tre|--reftree <value>: reference tree output fromMCMCtree. This ensures the order of the gradient and Hessian are compatible withMCMCtree.- Options: can be generated by
sed '4!d' in.BV(see the example above). - Defauly: None.
- Options: can be generated by
Optional arguments
-
--hessian_type- Options: STK2004 (outer product of scores estimator [OPS], default in
CODEML) or fd (calculation on the 2nd-order derivative directly, much slower) - Default: STK2004 (OPS)
- Options: STK2004 (outer product of scores estimator [OPS], default in
-
--cpu <value>: Number of threads to use.- Default: 1
-
--phylo_prog <value>: Program used for phylogenetic reconstruction (estimation of branch lengths, topology is fixed forMCMCtree).- Options:
iqtreeorphyml. - Default:
iqtree
- Options:
-
--pmsf: Perform a PMSF approximation (only valid for mixture models).- Options: Can also be specified by
-m LG+C10+PMSF. See Wang et al., 2018 for details. - Default: None
- Options: Can also be specified by
-
--force: Remove theoutdirif it exists.- Options: (Flag - presence enables the functionality). Presence of the flag triggers the removal.
- Default: The
outdiris not removed.
-
--no_mwopt: Disable optimization of mixture weights (not recommended).- Options: if specified
--mwoptforIQ-Treewill be turned off. - Default: Mixture weight optimization is enabled.
- Options: if specified
julia/some information of the tree structureiqtree/orphymloutput of tree reconstructionbl/some information about the branch lengthsph/inBV/the folder containing the most important outputsinfothe lnL of the tree evaluated at the MLEs (maximum-likelihood estimates) of all parametersin.BVthe file that contains the MLEs of branch lengths, gradients, and Hessian.
execution_{TIME}.logthe log file
- Copy the file into your working directory
Command:cp ph/inBV/in.BV . - Edit the control file
File:mcmctree.ctlSet:usedata = 3 in.BV - Run the analysis
Command:mcmctree mcmctree.ctl
Notes:
- Step 1 assumes your current directory is where you want to run mcmctree. Adjust paths if needed.
- Preserve spacing in the control file to match mcmctree’s expected format.
- See also the very nicely written tutorial dos Reis, et al. 2017
- LG+C60+G under PMSF Wang et al., 2018
ruby ../phyloHessianWrapper.rb -s sim/alignment/combined.fas --reftree ref.tre -m LG+C60+G+PMSF --outdir outdir --force --cpu 10- LG+C60+G without
--mwopt(use the default weights for each mixtures)
ruby ../phyloHessianWrapper.rb -s sim/alignment/combined.fas --reftree ref.tre -m LG+C60+G --outdir outdir --force --cpu 10 --no_mwopt- EX2+G with
PhyML(need to havephyml-mixSSinstalled)
ruby ../phyloHessianWrapper.rb -s sim/alignment/combined.fas --reftree ref.tre -m EX2+G --outdir outdir --force --cpu 10 --phylo_prog phymlYou might want to cite the following in case you use phyloHessian in case they are involved in your analysis.
- Wang S, Meade A. Molecular Clock Dating of Deep-Time Evolution Using Complex Mixture Models. https://www.biorxiv.org/content/10.1101/2025.07.17.665246v1. bioRxiv. 2025:2025-07.
And tools (potentially) involved
- Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, Lanfear R. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Molecular biology and evolution. 2020 May 1;37(5):1530-4.
- Revell LJ. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ. 2024 Jan 5;12:e16505.
Others
- Wang HC, Minh BQ, Susko E, Roger AJ. Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Systematic biology. 2018 Mar 1;67(2):216-35.
- dos Reis M, Álvarez-Carretero S, Yang Z. MCMCTree tutorials. https://gensoft.pasteur.fr/docs/paml/4.9j/MCMCtree.Tutorials.pdf. 2017 Apr 28