Skip to content

Commit 618dcb9

Browse files
authored
Update 1.5.2_Viral_Taxonomy_and_Phylogeny_II.md
1 parent 5b22621 commit 618dcb9

File tree

1 file changed

+15
-46
lines changed

1 file changed

+15
-46
lines changed

_episodes/1.5.2_Viral_Taxonomy_and_Phylogeny_II.md

Lines changed: 15 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -73,29 +73,22 @@ Phylogenetic trees come in several output types but here the above commands prod
7373
#### The final terL tree might look something like this:
7474
![example_unrooted_terL](../assets/img/example_terL_tree.png)
7575
76-
This `.tree` file _can_ be opened with dendroscope (if you downloaded it locally) or [iTOL](https://itol.embl.de/) or another tree-visualization software **HOWEVER** the full terL tree we will produce will have 5000+ leaves, which will be difficult to load into tree-viewers and might crash your computer. Instead, we will post-process the tree using the `ete3` library in Python. We've built two python scripts using this library to shrink the size of tree outputs. The locations of the tree-pruning scripts will be in your python_scripts directory `python_scripts/1.5_collapse_non_target_clades.py` and `python_scripts/1.5_trim_tree_to_500_neighbors.py`. Below are explanations of these scripts.
76+
This `.tree` file _can_ be opened with dendroscope (if you downloaded it locally) or [iTOL](https://itol.embl.de/) or another tree-visualization software **HOWEVER** the full terL tree we will produce will have 5000+ leaves, which will be difficult to load into tree-viewers and might crash your computer when trying to find your viruses.
7777
78-
### Post-process large tree --> smaller tree
78+
Instead, you could post-process the tree using the `ete3` library in Python.
7979
80-
#### Prune the tree
81-
82-
`# 1.5_trim_tree_to_500_neighbors.py`
83-
84-
**User inputs:**
85-
1. Newick `.tree` file
86-
2. A user-selected viral contig of interest
87-
88-
**What the script does:**
89-
Prunes the terL tree around your selected contig by selecting 500 of the closest neighbours (based on branch lengths)
80+
> ## Exercise - Post-process large tree --> smaller tree
81+
>
82+
> There are a few ways to post-process trees to make them viewable or to highlight our viruses of interest.
83+
> One way would be build a script that takes the input of a single contig and the tree file and pruning the tree around the contig to a certain number of the closest reference viruses. Pruning a tree means only certain clades or leaves are left.
84+
> Instead of pruning, you can also "collapse" clades to make the tree manageable to view. For this, clades are collapsed into a single leaf that replaces them. In very large trees, you will probably encounter large clades that are far away from your sequences of interest that can be collapsed and replace. For this, you might want to give an input of all your viruses of interest.
85+
> The `ete3` library is a really useful resource to process trees.
86+
>{: .source}
87+
{: .challenge}
9088
91-
**Script outputs:**
92-
1. A pruned tree with only 500 leaves
93-
2. An itol annotation file
89+
We've built two python scripts using this library to shrink the size of tree outputs. The locations of the tree-pruning scripts will be in your python_scripts directory `python_scripts/1.5_collapse_non_target_clades.py` and `python_scripts/1.5_trim_tree_to_500_neighbors.py`.
90+
`1.5_trim_tree_to_500_neighbors.py` will trim your tree around a contig of your choice. `1.5_collapse_non_target_clades.py` will collapse clades that don't contain viruses from our dataset. These scripts also include a section to generate ITOL annotation files.
9491
95-
**Usage**
96-
```bash
97-
python3 trim_tree_to_500_neighbors.py terL_MSA_trimmed.tree contig_name terl_contigXX_pruned.tree contigXX_itol_annotation.txt
98-
```
9992
> ## python script to trim tree around 1 contig
10093
>```python
10194
># trim_tree_to_500_neighbors.py
@@ -141,32 +134,7 @@ python3 trim_tree_to_500_neighbors.py terL_MSA_trimmed.tree contig_name terl_con
141134
>{: .source}
142135
{: .solution}
143136
144-
#### Collapse tree branches
145-
146-
`# 1.5_collapse_non_target_clades.py`
147137
148-
**User inputs:**
149-
1. Newick `.tree` file
150-
2. A file `user_leaves.txt` containing a list of contig ids of interest like below
151-
152-
```
153-
# user_leaves.txt
154-
contig_0001_CDS_0001
155-
contig_0002_CDS_0005
156-
contig_0003_CDS_0023
157-
```
158-
159-
**What the script does:**
160-
Collapses clades in the tree that are over 100 leaves and do not contain your contigs. *Note: these clades are no longer available to view in the collapsed tree.*
161-
162-
**Script outputs:**
163-
1. A tree with collapsed clades
164-
2. An itol annotation file for the user-selected contigs
165-
166-
**Usage**
167-
```bash
168-
python3 collapse_non_target_clades.py terL_MSA_trimmed.tree user_leaves.txt terl_collapsed.tree collapsed_itol_annotation.txt
169-
```
170138
> ## python script to collapse clades except our viruses
171139
>```python
172140
># collapse_non_target_clades.py
@@ -240,6 +208,9 @@ python3 collapse_non_target_clades.py terL_MSA_trimmed.tree user_leaves.txt terl
240208
>
241209
> Use the below sbatch script make your tree and post-process it. You will have change the contig name in the sbatch script. **Note:** See exercise question #10 below - you might want to plot this 'pet contig' here, or the one from yesterday!
242210
> **Please include a graphic of tree in your lab books with your own contigs labeled or highlighted somehow.**
211+
>
212+
> Please pause once you have made the trees! We will visualize the tree together with a demo in itol.
213+
>
243214
> {: .source}
244215
{: .challenge}
245216
@@ -290,8 +261,6 @@ python3 collapse_non_target_clades.py terL_MSA_trimmed.tree user_leaves.txt terl
290261
>{: .source}
291262
{: .solution}
292263
293-
Please stop here! We will visualize the tree together with a demo in itol.
294-
295264
### vConTACT3
296265
297266
vConTACT3 has an underlying assumption that the fraction of shared genes between two viruses represents their evolutionary relationship. The vConTACT3 gene-sharing network closely correlates with the ICTV taxonomy.

0 commit comments

Comments
 (0)