Sapling

Sapling is a method that infers a small set of backbone trees on a smaller subset of mutations that collectively summarize the entire set of possible phylogenies. Sapling can also grow given backbone trees into full phylogeny. Finally, Sapling can directly output full phylogenies up to a specified fraction 1-rho away from optimality.

Dependencies

Sapling requires the following python packages:

numpy
pandas
tqdm
one of cvxopt/gurobipy/fastppm

If you choose to use fastppm, please make sure that the corresponding python library is installed in $PYTHONPATH. This includes the current directory. For example:

❯ ls -alFh fastppm*
lrwxr-xr-x 1 melkebir 72 Jan 28 14:33 fastppm.cpython-311-darwin.so -> /Users/melkebir/Projects/fastppm/build/src/fastppm.cpython-311-darwin.so*

I/O Format

Sapling takes a TSV (tab-separated values) file as input. The first line includes the names of the columns. The following columns are required:

sample_index: (0, 1, ..., m-1) for m samples.
mutation_index: (0, 1, ..., n-1) for n mutations.
var: The number of variant reads supporting the mutation.
depth: The total read depth at the locus.
cluster_index (optional): (0, 1, ..., k-1) for k mutation clusters. If not provided, cluster_index defaults to mutation_index.

Here is an example input file.

The output is a TSV file. The first line includes the names of the columns, including:

tree: The index of the tree.
llh: The log-likelihood value of the tree.
pi_i: The parent index for mutation i (where i is the mutation index). A value of -1 indicates that node i is a root node.
f_p_i: The inferred frequency of mutation i in sample p.

Here is an example output file.

Usage instructions

      usage: sapling.py [-h] [-f F] [--init_trees INIT_TREES] [-o O] [--sep SEP] [-a RHO] [-t TAU] [-l ELL] [--big_expand] [-b BEAM_WIDTH] [-L {cvxopt,pLP,fastppm}] [--alt_roots] [-m] [--use_clusters]

      Sapling is an algorithm for summarizing and inferring tumor phylogenies from bulk DNA sequencing data

      options:
      -h, --help            show this help message and exit
      -f F                  Input filename with mutation read counts (default: STDIN)
      --init_trees INIT_TREES
                              Input filename with initial backbone trees to expand
      -o O                  Output filename to store trees and frequencies (default: STDOUT)
      --sep SEP             Input/output column separator (default: \t)
      -a RHO, --rho RHO     Rho parameter, minimum deviation allowed from max likelihood (default: 0.9, ignored when beam_width specified)
      -t TAU, --tau TAU     Tau parameter, maximum number of backbone trees (default: 5)
      -l ELL, --ell ELL     Ell parameter, minimum number of mutations (default: -1, unlimited)
      --big_expand          Use big expand (new mutations are anywhere, not just as leaves or splitting a single edge)
      -b BEAM_WIDTH, --beam_width BEAM_WIDTH
                              Maximum beam width (default: -1, limited only by --rho)
      -L {cvxopt,pLP,fastppm}
                              Regression method (default: fastppm)
      --alt_roots           Explore alternative root nodes
      -m, --poly_clonal_root
                              Allow poly clonal root node
      --use_clusters        Use provided clustering (taking median read depth and using average frequency for variant counts)

Example command:

python sapling.py --tau 5 --rho 0.9 < example/example_input.tsv > example/example_output.tsv

This will output up to tau=5 backbone trees with a likelihood cut-off of rho==0.9. The output of the above command on the example input.

Infer full trees

To infer full trees use the following options:

python sapling.py --tau -1 --ell -1 --beam_width 100 < example/example_input.tsv > example/example_full_trees.tsv

This will use a beam width of 100 to enumerate up to a 100 trees containg all mutations. Here is the output.

Expand given backbone trees into full trees

Example command:

python sapling.py --tau -1 --ell -1 --rho 0.9 --init_trees example/example_output.tsv -f example/example_input.tsv -o example/example_expand.tsv

This will expand the given backbone trees into full trees (no restrictions on tau and ell) that are a factor of 1-rho=1-0.9=0.1 away from optimality. Here is the output of the above command.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Input		Input
Simulation		Simulation
example		example
Figure1.png		Figure1.png
README.md		README.md
bbt_solver.py		bbt_solver.py
likelihood_function.py		likelihood_function.py
likelihood_function_LP.py		likelihood_function_LP.py
likelihood_optimizer.py		likelihood_optimizer.py
likelihood_optimizer_LP.py		likelihood_optimizer_LP.py
sapling.py		sapling.py
tree_utils.py		tree_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sapling

Dependencies

I/O Format

Usage instructions

Infer full trees

Expand given backbone trees into full trees

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

elkebir-group/Sapling

Folders and files

Latest commit

History

Repository files navigation

Sapling

Dependencies

I/O Format

Usage instructions

Infer full trees

Expand given backbone trees into full trees

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages