Skip to content

Materials for EN.601.449/649 Computational Genomics: Applied Comparative Genomics

License

Notifications You must be signed in to change notification settings

schatzlab/appliedgenomics2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JHU EN.601.449/EN.601.649: Computational Genomics: Applied Comparative Genomics

Prof: Michael Schatz (mschatz @ cs.jhu.edu)
TA: Mahler Revsine (mrevsin1 @ jh.edu)
Class Hours: Monday + Wednesday @ 3:00p - 4:15p Hodson 316
Schatz Office Hours: By appointment
Revsine Office Hours: TBD and by appointment

The primary goal of the course is for students to be grounded in the fundamental theory and applications to leave the course empowered to conduct independent genomic analyses. We will study the leading computational and quantitative approaches for comparing and analyzing genomes starting from raw sequencing data. The course will focus on human genomics and human medical applications, but the techniques will be broadly applicable across the tree of life. The topics will include genome assembly & comparative genomics, variant identification & analysis, gene expression & regulation, personal genome analysis, and cancer genomics. A major focus will be on deep learning and machine learning to tackle these problems. The grading will be based on assignments, two exams, class presentations, and a significant class project. There are no formal course prerequisites, although the course will require familiarity with UNIX scripting and/or programming to complete the assignments and course project.

Prerequisites

Course Resources:

Related Courses & Readings

Related Textbooks

Schedule

Class Date Day Topic Assignments Readings
1 8/25/25 M Introduction Sign up for Piazza * Molecular Structure of Nucleic Acid (Watson and Crick, 1953, Nature)
* Biological data sciences in genome research (Schatz, 2015, Genome Research)
* Big Data: Astronomical or Genomical? (Stephens et al, 2015, PLOS Biology)
2 8/27/25 W Genomics Technologies Ass1 * Coming of age: ten years of next-generation sequencing technologies (Goodwin et al, 2016, Nature Reviews Genetics)
* Guide to k-mer approaches for genomics across the tree of life (Jenike et al., 2024, arXiv)
* 9/1/25 M $${\color{red}\text{Labor Day}}$$
3 9/3/25 W Assembly and WGA * Toward simplifying and accurately formulating fragment assembly. (Myers, 1995, J. Comp. Bio.)
* Velvet: Algorithms for de novo short read assembly using de Bruijn graphs (Zerbino and Birney, 2008, Genome Research)
* SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing (Bankevich, et al. 2012, J Comput Biol)
* MUMmer: Alignment of Whole Genomes (Delcher et al, 1999, NAR)
4 9/8/25 M The Human Genome * The complete sequence of a human genome (Nurk et al, Science 2012)
* Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing (Kovaka et al, 2023, Nature Methods
* A draft human pangenome reference (Liao et al, 2023, Nature)
* Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References (Taylor et al., 2024, Annual Review of Genomics and Human Genetics)
5 9/10/25 W Read Mapping Ass2 * How to map billions of short reads onto genomes (Trapnell and Salzberg, 2009, Nature Biotech)
* Sapling: Accelerating Suffix Array Queries with Learned Data Models (Kirsche et al, 2020, Bioinformatics)
6 9/15/25 M BWT * Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome (Langmead et al, 2009, Genome Biology)
* BWA-MEM: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (Li, 2013, arXiv)
* Minimap2: pairwise alignment for nucleotide sequences (Li, Bioinformatics, 2018)
7 9/17/25 W Variant Analysis * Haplotype-based variant detection from short-read sequencing (Garrison and Marth, arXiv, 2012)
* The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data (McKenna et al, 2010, Genome Research)
* A universal SNP and small-indel variant caller using deep neural networks (Poplin et al, 2018, Nature Biotechnology
* SAM/BAM/Samtools: The Sequence Alignment/Map format and SAMtools (Li et al, 2009, Bioinformatics)
* IGV: Integrative genomics viewer (Robinson et al, 2011, Nature Biotech)
8 9/22/25 M Intro to ML * What are decision trees? (Kingsford and Salzberg, 2008, Nature Biotechnology)
* What is a hidden Markov model? (Eddy, 2004, Nature Biotechnology)
* Deep learning in biomedicine (Wainberg et al, 2018, Nature Biotechnology)
* Visualizing Data Using t-SNE
9 9/24/25 W CNN + DeepVariant Ass3 * ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al., 2012, NIPS)
* A universal SNP and small-indel variant caller using deep neural networks (Poplin et al. 2018, Nature Biotech)
10 9/29/25 M Populalation Genetics * An integrated map of genetic variation from 1,092 human genomes (1000 Genomes Consortium, 2012, Nature)
* Analysis of protein-coding genetic variation in 60,706 humans (Let et al, 2016, Nature)
* A Draft Sequence of the Neandertal Genome (Green et al. 2010, Science)
* Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals (Vernot et al. 2016. Science)
* Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) (Schatz et al, 2022, Cell Genomics)
11 10/1/25 W Clinical Genomics * Genome-Wide Association Studies (Bush & Moore, 2012, PLOS Comp Bio)
* The contribution of de novo coding mutations to autism spectrum disorder (Iossifov et al, 2014, Nature)
12 10/6/25 M Review 1 Project Proposal
13 10/8/25 W $${\color{orange}\text{Exam 1 (In class)}}$$
14 10/13/25 M Functional Analysis 1: RNA-seq * RNA-Seq: a revolutionary tool for transcriptomics (Wang et al, 2009. Nature Reviews Genetics)
* Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks (Trapnell et al, 2012, Nature Protocols)
* Salmon provides fast and bias-aware quantification of transcript expression (Patro et al, 2017, Nature Methods)
* Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications (Krueger and Andrews, 2011, Bioinformatics)
15 10/15/25 W Functional Analysis 2: Single Cell Genomics Ass4 * Ginkgo: Interactive analysis and assessment of single-cell copy-number variations (Garvin et al, 2015, Nature Methods)
* The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells (Trapnell et al, Nature Biotech, 2014)
* Eleven grand challenges in single-cell data science (Lahnemann et al, Genome Biology, 2020)
16 10/20/25 M Functional Analysis 3: Ab initiio gene finding * BLAST: Basic Local Alignment Search Tool
* Glimmer: Microbial gene identification using interpolated Markov models
* MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects
* BEDTools: a flexible suite of utilities for comparing genomic features (Quinlan & Hall, 2010, Bioinformatics)
17 10/22/25 W Functional Analysis 4: Methyl-seq, Chip-seq, and Hi-C Prelim report assigned * ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions (Furey, 2012, Nature Reviews Genetics)
* PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls (Rozowsky et al. 2009. Nature Biotech)
* Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome (Lieberman-Aiden et al, 2009, Science)
18 10/27/25 M Functional Analysis 5: Regulatory States, ENCODE, GTEx, RoadMap * An integrated encyclopedia of DNA elements in the human genome (The ENCODE Project Consortium, Nature, 2012)
* Genetic effects on gene expression across human tissues (GTEx Consortium, Nature, 2017)
* Integrative analysis of 111 reference human epigenomes (Roadmap Epigenome Consortium, Nature, 2015)
* ChromHMM: automating chromatin-state discovery and characterization (Ernst & Kellis, 2012, Nature Methods)
* Segway: Unsupervised pattern discovery in human chromatin structure through genomic segmentation (Hoffman et al, 2012, Nature Methods)
19 10/29/25 W Transformers Ass5 * Attention is all you need (Vaswani et al. 2017, arXiv)
20 11/3/25 M Enformer + Other DL applications * Effective gene expression prediction from sequence by integrating long-range interactions (Avsec et al., 2021, Nature Methods)
* Personal transcriptome variation is poorly explained by current genomic deep learning models
(Huang et al., 2023, Nature Genetics)

* Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings (Sasse et al., 2023, Nature Genetics)
21 11/5/25 W AlphaGenome, Evo2, and related models * AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model (Avsec et al, 2025, bioRxiv)
* Genome modeling and design across all domains of life with Evo 2 (Brixi et al, 2025, bioRxiv)
22 11/10/25 M Review 2 Final Report Assigned
23 11/12/25 W $${\color{orange}\text{Exam 2 (In class)}}$$
24 11/17/25 M Wrap up * Deep Learning Sequence Models for Transcriptional Regulation (Sokolova et al., 2024, Annual Reviews of Genomics and Human Genetics)
* AlphaFold (Jumper et al, 2021, Nature)
25 11/19/25 W In-class presentation
* 11/24/25 M $${\color{red}\text{Thanksgiving Break}}$$
* 11/26/25 W $${\color{red}\text{Thanksgiving Break}}$$
26 12/1/25 M In-class presentation
27 12/3/25 W In-class presentation
* 12/10/25 W Draft Report Due
* 12/11/25 Th Final project presentation
* 12/12/25 F Final project presentation
* 12/15/25 M Final project presentation
* 12/16/25 Tu Final Report Due

About

Materials for EN.601.449/649 Computational Genomics: Applied Comparative Genomics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •