This repo collects scripts and output data used to generate figures for the haplotype segment-matching analysis in Scheib et al. (2018). It also contains an outline of the derivation of some relevant quantities required for the likelihood analysis.
Scripts used for haplotype segment matching in Scheib et al. (2018):
- extract_sites.py: Helper functions for the extraction and collation of relevant sequence information into the "*.sites" format used in pymatch.py
- pymatch.py: Main functions for the segment scoring and matching procedure
- scatter_plot.py: Generating the scatter plot in publication using data provided
- likelihood_plot.py: Generating the likelihood heatmap used in the publication
Collected here is output produced by pymatch.py using ARG samples generated by ARGweaver (not uploaded). The data can be used directly with likelihood_plot.py and scatter_plot.py. As an example, the file Pima_anzick_CK-13.out.2000.txt in directory Pima_anzick_CK-13.out corresponds to segment scores identified for the 2000th MCMC sample of the ARG generated by ARGweaver. The comparison uses the modern Pima population (phased haplotypes from the Simons Genome Project) and the ancient seqences Anzick-1 from Rasmussen (2014) and CK-13 from Scheib et al. (2018).
A derivation of the segment matching probabilities is shown in method.pdf.
This repo will be permanent, but deprecated after a forthcoming (2018) publication by Tariq Desai and Aylwyn Scally