GitHub - BGD-UAB/iMKTData: Pipeline to extract unfolded site frequency spectrum from 1000GP VCF and DGN alignments

iMKT data repository

Pipelines to extract unfolded site frequency spectrum from 1000GP VCF and DGN alignments. Each pipeline returns a tab-delimited file including information unfolded site frequency spectrum, analyzable sites by largest transcrip and divergence by genes, populations and MKT functional classes (0-fold: selected class; 4-fold: neutral class). Both allows the Drosophila melanogaster and humans proteins analysis through iMKT web-service and iMKT R-package.
This repository only include raw code to get main results. notebooks/ folder include two main Jupyter Notebooks running on Python 3.6 kernel to execute step by step the pipeline. src/ folder contain raw scripts to needed to execute the pipelin. Please note that multiple step could be parallelized, in this case create yourself customs bash scripts or run it on your server manually.
Pipeline were developed in the conda enviroment imktData.yml in local server: 100GB RAM and 16 Intel(R) Xeon(R) CPU.
In addition structure.sh deposited in scr/ create the folders we used to complete the whole process. If you decided execute it, ovewrite notebook/ and src/ with the same folders deposited at this repository.

Data retrieve

Pipelines execution requiere to download the following files. Paths would need to be changed too.

D. melanogaster population genomic data.

Variation data generated by the Drosophila Genome Nexus, together with divergence data between D. melanogaster and D. simulans, was retrieved from PopFly (Hervás et al. 2017) in FASTA format (also available in DGN web site). Recomb data from Comeron et al. 2012

Human population genomic data.

Genome variation data and information of the ancestral state of the variants generated by the 1000GP Phase III (1000 Genomes Project Consortium 2015), together with divergence between humans and chimpanzees, were retrieved from PopHuman (Casillas et al. 2018) in Variant Call Format (VCF). Recomb data from Bhèrer et al. 2017. Pilot mask to exclude low quality variants download from 1000GP ftp.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
notebooks		notebooks
src		src
README.md		README.md
imktData.yml		imktData.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iMKT data repository

Data retrieve

D. melanogaster population genomic data.

Human population genomic data.

Jupyter notebooks

01_D. melanogaster pipeline (python/bash)

02_Human pipeline (python/bash)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

iMKT data repository

Data retrieve

D. melanogaster population genomic data.

Human population genomic data.

Jupyter notebooks

01_D. melanogaster pipeline (python/bash)

02_Human pipeline (python/bash)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages