This repository contains code and data from Koche et al. Extrachromosomal circular DNA drives oncogenic genome remodeling in neuroblastoma Nat Genet (2020).
Circle Calls
Data/CircleseqCircles.bedcontains Circle-seq circle calls for our datasetData/WGSinferredCircles_eccDNA.txtcontains WGS-inferred circle calls for our dataset (eccDNA only)Data/WGSinferredCircles_ecDNA.txtcontains WGS-inferred circle calls for our dataset (ecDNA only)
WGS Rearrangement Calls
Data/MergedSV_StrictestFiltering_CircleSeqCircleAnnotation.txtare the merged and filtered interchromosomal rearrangement calls (see Structural Variant Merging) which were overlapped with Circle-seq circle calls for classification (available upon request)Data/MergedSV_StrictestFiltering_WGSCircleAnnotation.txtare the merged and filtered interchromosomal rearrangement calls (see Structural Variant Merging) which were overlapped with WGS-inferred circle calls for classification (available upon request)Data/PedPanCanSVs.csvcontains structural variants in the DKFZ Pediatric Pan-Cancer dataset published by Gröbner et al. 2018 (data downloaded from the corresponding R2 platform)Data/PedPanCanMeta.csvcontains metadata such as the tumor entity for the DKFZ Pediatric Pan-Cancer dataset
Expression Data
Data/berlin_cohort_rnaseq_fpkm.txtis RNA-seq data for the Berlin neuroblastoma cohort (available upon request)Data/peifer_54nb_fpkms.txtis RNA-seq data published by Peifer et al.
Clinical Data
Data/ClinicalData.csvspecifies risk group and survival data
Other
Data/BroadIntervals/b37-[...]_MinusBlckLst.txtspecifies the mappable, non-blacklisted genome we use for randomization analyses
script_merge_translocations[...].pycollapses interchromosomal structural variants from different callersscript_filter_mapqBAM_3.0.pyfilters collapsed interchromosomal structural variants using common criteriascript_merge_intraSVs_[...].pycollapses intrachromosomal structural variants from different callersscript_filter_atleast2callers_2.0.pyfilters collapsed intrachromosomal structural variants using common criteriascript_merge_circularized_regions_[...].pyoverlaps interchromosomal structural variants with circle calls and classifies accordinglyscript_merge_circularized_regions_[...]_intraSV_[...].pyoverlaps intrachromosomal structural variants with circle calls and classifies accordinglyscript_search_integrations_[...].pysearches for two-breakpoint patterns indicative of circle integration into a chromosome
Tree-shaped rearrangement regions are referred to as Palm Trees throughout the code.
RunAll.Rruns scripts in the correct orderParse*.RandPrepare*.Rare several scripts to read data from different sources and create tidy representations for further analysisCallPalmTrees.Ruses output of structural variant merging to call palm trees (a.k.a. tree-shaped rearrangements or clusters of rearrangements)CallPalmTreesPedPanCan.Rcalls palm trees on published data from Gröbner et al. 2018 (using the same settings as in CallPalmTrees.R)
CircosPlots.Rplots the identified rearrangements and palm tree regionsPalmTreeStackPlot.Rcreates plots that integrate all CN, SV and Circle-seq informationGeneralPalmTreeStatistics.Ranalyses number and length distributions of palm treesPalmTreeDensity.Rplots genome-wide recurrence of palm treesPedPanCanStats.Rexplores palm tree occurrence in PedPanCan datasetCompareBerlinCohortToPedPanCan.Rcompares palm tree prevalence between the Peifer/Berlin dataset and the PedPanCan neuroblastoma casesAnalyseNBCallPalmTreesRandomization.Rmdexplores 500 synthetic datasets for our cohort and the PedPanCan dataset respectively. Synthetic datasets were obtained by randomizing breakpoint positions during palm tree calling. This is used to estimate false-discovery rates due to the number of rearrangements per sample.
PalmTreeGenes.Ranalyzes which genes can be found within palm tree regionsPalmTreeTargetsCloseToOncogenes.Rstatistically analyses whether palm tree target sites are significantly enriched in the neighborhood of cancer-related genes.ExpressionAnalysis.Rscreens for deregulated genes close to palm tree associated rearrangements
RegionSampling.Rcontains sampling methods to randomly sample regions from a masked or unmasked genomeOverlap[...]CircleCalls.Rstatistically analyses the degree of overlap between palm tree regions and eccDNA / ecDNA inferred from WGS / Circle-seqMergeWGSCSOverlapPlots.Rmerges results from the preceding scripts for integrative plotting
ClinicalData.Rexplores the prognostic significance of palm trees
CircleJunctionAnalyseSvabaBreakpoints.Rinvestigates sequence characteristics reconstructed Circle-seq circle junctionsSVAnalyseSvabaBreakpoints.Rinvestigates sequence characteristics interchromosomal rearrangements calls by SvabaMemeAnalysis.shruns MEME on several breakpoint-associated sequencesmyHomology.Robtains microhomology estimates for accurate breakpoint coordinates from the reference genomeRepeats.Rtests for association of breakpoints with repetitive regions
If you have any questions concerning code or data, please do not hesitate to contact us at henssenlab@gmail.com.