Imputation on single-cell SNP array data

This repository has designed to perform imputation on SNP-array data to fill gaps in single-cell genomic data. This is particularly crucial in addressing the challenges posed by Whole Genome Amplification (WGA), a common technique used in single-cell genomics that can introduce significant background noise and result in missing genetic information.

Description - OAverview of the project's purpose and goals
Getting started - Instructions on how to begin with this project
Bioinformatic parameters - Explanation and details of the bioinformatic parameters used throughout the pipeline
Repository structure - A layout of the repository's architecture, describing the purpose of each file or directory
References - Tools used in the project
Authors - List of contributors to the project
Acknowledgments - Credits and thanks to those who helped with the project

Description

The repository shows some statistics related on SNP-array data in order to understand if imputation is able to reintegrate the loss information presente in single-cell data. The analysis begins with the creation of a bulk reference considering five different bulk data, followed by un pre processing dei dati. Subsequently, l'analisi procede con il calcolo di coefficienti di similarita e recall per comparare le situazione che precede e segue l'imputazione. Finally, vengono compiute delle statistiche descrittive e creati dei plots per mostrare i risultati ottenuti.

Getting started

To reproduce this analysis, it is essential to set up a Conda environment containing all the necessary libraries (specified in the requirements.txt file). After setting up the environment, it is important to run the following scripts in the specified order.

Use the functions from get_gdna_consensus.py to manipulate and analyze genomic DNA (gDNA) data: they perform various operations ranging from data concatenation, filtering, cleaning and analysis to visualization and data transformation.
Use the functions from get_references_map.py for downloading large genomic data files: they automate the process of downloading, unzipping, and organizing genomic data files into specified directories.
Use the functions from data_processing_pre_imputation.py for processing, filtering, and analyzing genomic data, particularly focused on single-cell (SC) genomics and consensus genomic DNA (gDNA) data.
Use the functions from get_positions_to_exclude.py to .
Use the functions from imputation.py to performs genetic imputation for each chromosome.
Use the functions from data_processing_post_imputation.py to .
Use the functions from creating_statistics.py to .
Use the functions from creating_plots.py to .

Bioinformatic parameters

Repository structure

File	Description
data/	This folder must contain another folder called "raw" in which there should be your personal input data included single-cell and bulk VCF files
requeriments.txt	File with names and versions of packages installed in the virtual environment to run the imputation
beagle.22Jul22.46e.jar	Beagle imputation tool to perform the imputation

References

Beagle5.4

Authors

Contact me at marcor@dtu.dk for more detail or explanations.

Acknowledgements

I would like to extend my heartfelt gratitude to KU and CCS(Center for Chromosome Stability) for providing the essential resources and support that have been fundamental in the development and success of Eva Hoffmann group projects.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data/raw		data/raw
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
beagle.22Jul22.46e.jar		beagle.22Jul22.46e.jar
creation_plots.py		creation_plots.py
data_analysis.py		data_analysis.py
data_processing_post_imputation.py		data_processing_post_imputation.py
data_processing_pre_imputation.py		data_processing_pre_imputation.py
environment.yml		environment.yml
get_gdna_consensus.py		get_gdna_consensus.py
get_positions_to_exclude.py		get_positions_to_exclude.py
get_references_map.py		get_references_map.py
imputation.py		imputation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imputation on single-cell SNP array data

Table of contents

Description

Getting started

Bioinformatic parameters

Repository structure

References

Authors

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Imputation on single-cell SNP array data

Table of contents

Description

Getting started

Bioinformatic parameters

Repository structure

References

Authors

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages