Skip to content

cmoyer-x/Dendrogram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Structural vs. Sequence Phylogeny

Tanglegram-Based Congruence Analysis

This project compares a structural guide tree generated by FoldMason against an amino acid maximum-likelihood tree inferred with IQ-TREE. The objective is to evaluate whether protein structural similarity recapitulates sequence-based phylogeny.

The workflow produces:

  • A publication-quality tanglegram (PNG)
  • Quantitative tree congruence metrics (CSV)
  • Fully reproducible outputs parameterized by a single prefix variable

Repository Structure

.
├── data/
│   ├── lysB_structure.nw        # FoldMason structural guide tree
│   └── lysB_AA.contree          # IQ-TREE AA consensus tree
├── results/                     # Auto-created; outputs written here
├── scripts/
│   └── tree_comparison.R
└── README.md

Usage

1. Install dependencies

install.packages(c("ape", "phangorn", "dendextend", "viridisLite"))

2. Configure and run

Open scripts/tree_comparison.R and set the parameters at the top of the file:

prefix      <- "lysB"    # shared filename prefix for inputs and outputs
data_dir    <- "data"    # directory containing input trees
out_dir     <- "results" # directory for outputs (created automatically)
h_rel       <- 0.25      # relative height for cluster cutoff (0–1)
random_seed <- 1         # for reproducible untangling

Then source the script:

source("scripts/tree_comparison.R")

3. Outputs

File Description
results/<prefix>_AA_vs_Struct_tanglegram.png High-resolution tanglegram (300 dpi)
results/<prefix>_AA_vs_Struct_tree_metrics.csv Normalized RF distance and entanglement score

Workflow

Step 1 — Input & Preprocessing

Trees are read with ape::read.tree(). Only taxa present in both trees are retained (shared tip intersection). Both trees are midpoint-rooted, polytomies are resolved only if necessary (multi2di()), and trees are ladderized for consistent visual ordering.

This ensures topological comparability without altering underlying evolutionary relationships.

Step 2 — Dendrogram Conversion (Ultrametric-Free)

Maximum-likelihood trees are not ultrametric, so a molecular clock cannot be assumed. To work around this:

  1. Cophenetic (patristic) distance matrices are computed from each tree
  2. Hierarchical clustering (hclust(..., method = "average")) is applied
  3. Trees are converted to dendrogram objects for use with dendextend

This preserves relative topology while enabling tanglegram comparison.

Step 3 — Structural Clustering & Coloring

Clusters are derived exclusively from the structural (left) dendrogram using a relative height cutoff:

cut height  =  h_rel × max(dendrogram height)

These structural clusters determine:

  • Tip label colors (applied to both trees)
  • Tanglegram connecting line colors

Branch edges are rendered in neutral gray (gray20) to keep visual focus on cross-tree correspondence. Colors are drawn from the viridis palette for colorblind accessibility.

Step 4 — Untangling & Export

The dendrograms are untangled with dendextend::untangle(..., method = "step2side") to minimize line crossings, then rendered to a 12 × 9 inch PNG at 300 dpi.


Quantitative Congruence Metrics

Metric Method Interpretation
Normalized Robinson–Foulds phangorn::RF.dist(unroot(tL), unroot(tR), normalize = TRUE) 0 = identical topology · 1 = maximally different
Entanglement dendextend::entanglement() 0 = perfect visual alignment · 1 = maximal crossing

Biological Interpretation

This workflow addresses a core evolutionary question:

Do structural similarity relationships mirror amino acid phylogeny?

Result Interpretation
Low RF + low entanglement Strong congruence between structural and sequence evolution
High RF Potential structural convergence, evolutionary decoupling, domain rearrangement, or alignment artifacts
Low RF + high entanglement Topologically similar trees with differing tip ordering — consider re-running with a different random_seed

Requirements

Package Purpose
ape Tree I/O, manipulation, midpoint rooting
phangorn Robinson–Foulds distance
dendextend Tanglegram construction and entanglement scoring
viridisLite Colorblind-safe cluster palette

About

This is a dendrogram R Script using Structural protein phylogeny information and amino acid phylogeny information.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors