Skip to content

Perturbation of hESC differentiation by EZH2 inhibition

Notifications You must be signed in to change notification settings

heteyszabolcs/hESC_EZH2i

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 

Repository files navigation

Study of the effect of EZH2 inhibition in hESC

Gene regulatory network (GRN) inference:

GRN inference and in silico perturbation simulation by CellOracle (v0.16.0)

Codes: utils/CellOracle_scripts/*.py

${\color{red}Important:}$

For CellOracle dependencies it's worth making a pyenv virtual environment with Python version 3.8 (or above). CellOracle installation is documented thouroughly here.

Necessary inputs for hESC EZH2i CellOracle analysis:

  • scRNA-Seq

In our case scRNA-Seq was initially processed by Seurat (see utils/sc_rna_workflow.R). As CellOracle needs scRNA-Seq data in h5ad format (AnnData), at first we must convert Seurat data object into h5ad. See: utils/create_h5ad_from_seurat.R script. Additionally, we must export the raw read count tables of the Seurat objects (e.g. seurat_obj@assays$RNA@counts), as CellOracle object needs the raw count table.

Seurat version: Seurat_5.0.2

  • scATAC-Seq

For CellOracle's base GRN inference, we must quantify the co-accessibility sites by cicero. The cicero outputs (connections and the peaks used by cicero) are mandatory inputs for our celloracle_atac_prep Python file. The cicero analysis (utils/cicero.R) needs computational resources (worth running on Uppmax). Also, base GRN configuration script needs a reference genome in fasta format (in our case, it was a UCSC hg38 fasta). For the cicero analysis, scATAC-Seq data were processed by standard scATAC Seurat workflow (e.g. here) and afterwards only the most variable scATAC-Seq peaks were considered in the co-accessibility estimation.

cicero version: cicero_1.3.9, with monocle3_1.3.5 + celloracle_atac_prep.py needs available BEDTools in the path

  • pseudotime analysis

Ideally, it is estimated by scanpy (see here: Pseudotime calculation). Pseudotime calculation (celloracle_pseudotime.py) needs the processed scRNA-Seq object in h5ad format.

Optional input:

The GRN construction might be supported by manually adding TF-target gene pairs (e.g. statistically significant TF-target gene predictions of Pando)

CellOracle run:

If the inputs are properly generated, we must run the individual Python scripts sequentially:

  1. Pseudotime calculation
  2. scATAC-Seq preprocessing ( = cicero output processing)
  3. base GRN construction
  4. in silico perturbations

GRN inference by Pando

Codes:

utils/pando.R

utils/pando_alvis.R (for cluster Alvis)

utils/integration.R - multi-omic data integration by Seurat

Aims:

  • Gene regulatory network (GRN) reconstruction of EZH2i-treated naïve epiblast-like cells (ELCs).
  • Finding lineage specific transcription factor (TF) hierarchy that interplays with EZH2 inhibition.

Reference: PANDO Nature paper - Fleck et al. (2022)

Data and results: hESC scATAC-Seq and hESC scRNA-Seq:

@ Uppmax: /crex/proj/snic2020-6-3/SZABOLCS/hESC_EZH2i/data  

GRN inferences:

@ Uppmax: /crex/proj/snic2020-6-3/SZABOLCS/hESC_EZH2i/results/GRN

Pando provides a flexible framework that incorporates multi-omic data (in our case scATAC-Seq and scRNA-Seq) to infer a global gene regulatory network. Our aim was to describe the main transcription factor changes due to EZH2 inhibition in ELCs.

Pando leverages multimodal single-cell genomic measurements and models gene expression through TF - peak (=ATAC-Seq peak) interactions.

Four main steps of Pando:

(1) Selecting candidate regulatory genomic regions (coming from scATAC data layer).

(2) Scanning regions for transcription factor binding motifs (motif enrichment).

(3) Selecting region-TF pairs for each target gene (target gene = candidate gene).

(4) Constructing a regression model with region-TF pairs as independent variables and the expression of the target gene as the response variable.

Pando requires these inputs for GRN initialization:

  • coembedded (merged) single-cell omic assays: scATAC-Seq, scRNA-Seq - mandatory (see utils/integration.R)
  • coembedded_Seurat_object[['peaks']] = scATAC-Seq assay
  • coembedded_Seurat_object[['RNA']] = scRNA-Seq assay
  • candidate genes (targets) to consider for GRN inference - mandatory
  • motif dataset for TF motif finding - PANDO has built-in dataset
  • candidate regions - e.g. cis-regulatory data (SCREEN) - optional, if it is not set every scATAC-Seq peak region is considered.

Pando's version controlling is sensitive. Recently only Pando v1.1.0 runs successfully with Seurat v5 (SeuratObject_5.0.1, Seurat_5.0.3, Signac_1.12.0):

devtools::install_github("quadbio/[email protected]")

About

Perturbation of hESC differentiation by EZH2 inhibition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published