Pipeline used to 1) filter and annotate Hi-C or SMC1 HiChIP contact frequency data with enhancers and promoters, 2) cluster this data to identify enhancer-promoter hubs, which are networks of spatially interacting regulatory elements within the nucleus, and 3) compare hubs across conditions.
Uses Hi-C chromatin contact frequency, H3K27ac ChIP-seq peaks, ATAC-seq peaks, TSS, and RNA-seq gene expression to identify spatial interactions between enhancers (i.e. accessible H3K27ac peaks) and promoters (i.e. actively transcribed, accessible TSSes). Spatial interactions between regulatory elements are then assigned a normalized contact frequency score and filtered to yield a datatable of valid spatial interactions that can be used to identify hubs. Example data is provided to illustrate input data formatting requirements.
Uses SMC1 HiChIP data, H3K27ac ChIP-seq peaks, TSS, and RNA-seq gene expression to identify spatial interactions between enhancers (i.e. H3K27ac peaks) and promoters (i.e. actively transcribed TSSes). A HiChIP interaction caller is first used to detect significant SMC1 HiChIP interactions (not shown here). These interactions are then filtered such that only interactions between putative regulatory elements are used to identify hubs. Example data is provided to illustrate input data formatting requirements.
Once valid spatial interactions between regulatory elements are determined, hub_pipeline.sh (for two conditions) or hub_pipeline_single.sh (for a single condition) can be used to cluster interactions and construct enhancer-promoter hubs, which are categorized by within-hub spatial interaction counts, regulatory element counts, and the genes broadly contained within the hubs. The hub_pipeline.sh script also offers the option to compare hubs across two separate conditions on the basis of their genomic overlap and interaction counts in order to identify differential hubs in silico. Finally, calculate_hyperconnected_hubs.R can be used to identify hyperinteracting hubs on the basis of interaction count.