This is a tutorial about causal inference for genomics data for the NORBIS Research School Course Genomics for Precision Medicine.
In this tutorial, you will look at a few TF/gene pairs from data published in [1]:
- to study the distributions of expression values,
- to perform linear regression to estimate causal effects,
- to compare with results obtained with the causal Findr tests [2,3],
- and to check on databases what the known interactions are.
Note: the Jupiter notebooks (files with .ipynb extension) can be run either locally if you have python and Jupyter installed on your system or you can load them in google Colab.
The notebook Covariates_notebook.ipynb is setup to download the covariates file (SI_Data_02_covariates.xlsx) and generate a graphical representation of these covariates for the differential gene expression data from [1].
The notebook Causal_Inference_with_Linear_Regression_20210618.ipynb is setup to download the data files of differential gene expression data corrected for covariates. After that you will preform the following tasks:
- Make a visualisation of gene expression values.
- Perform linear regression on gene pairs and visualise the result.
- Answer the following question: "Are the distributions of the Regulator and Target Genes' expression values different for the possible Marker (eQTL) genotypes?"
- Check on databases what the known interactions are:
There are two more files in this repository:
-
Example_1_covariate_regression_on_expression_data.py: is a python script that does the covariate correction. -
Example_2_yeast_run_findr.py: allows you to run the Findr tests on the differential gene expression data. For more information about Findr see [3].
-
Albert et al. (2018) eLife, 7, e35471. Genetics of trans-regulatory variation in gene expression. DOI
-
Ludl & Michoel. (2021) Molecular Omics, 17, 241-251. Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. DOI -- github repo
-
Wang & Michoel. (2017) PLOS Computational Biology, 13(8), e1005703. Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data. DOI -- github repo