Releases · epigen/enrichment_analysis

08 Jun 13:27

sreichl

v3.0.0

0829687

v3.0.0 - Configurable GREAT region annotation and smaller aggregate outputs Latest

Latest

This release changes how GREAT-associated regions and genes are exported to reduce runtime, file size, duplicated data, and Excel compatibility problems caused by very large annotation cells.

Features (breaking change)

Added great_parameters:map_associated_regions in the config to control how many significant GREAT terms are annotated with associated query regions and genes in individual query result tables. This update to the configuration file represents the breaking change.
The new default is 1, so users can inspect an example annotated term without paying the cost of annotating every significant term.
Supported values:
- 0: do not annotate GREAT terms with associated regions/genes
- positive integer, for example 5: annotate that many top significant terms ranked by the configured adjusted p-value column
- -1: annotate all significant terms, restoring the previous behavior

Changes

GREAT group-level aggregated result CSV files no longer include the regions and annotated_genes columns.
These columns remain available in individual GREAT query result CSV files when annotation is enabled with map_associated_regions.
This avoids duplicating very large region/gene annotation strings across aggregate files and prevents aggregate outputs from becoming unnecessarily large.
GREAT parameters are passed through Snakemake rule params instead of being read directly from the config inside the GREAT analysis script.

Documentation

Updated the example config and schema with great_parameters:map_associated_regions.
Updated the README and config documentation to warn that GREAT region/gene annotation can take a long time, substantially increase file size, and break Excel usage because cells can exceed Excel's 32,767 character limit.

What's Changed

Limit GREAT region annotations by @martin5555555555 in #63

Full Changelog: v2.1.0...v3.0.0

Contributors

martin5555555555

Assets 2

07 May 10:25

martin5555555555

v2.1.0

921efa4

v2.1.0 - Additional genomic regions analysis outputs and new summary visualization

This release expands the enrichment workflow with the addition of a 'specific' group summary plot, additional pycisTarget and GREAT outputs, more robust handling of sparse or empty inputs, and updated documentation and infrastructure.

Features

The group summary outputs are now *_summary_topTerms.png and *_summary_specificTerms.png, a new group-level summary plot highlighting terms that are more specific to individual groups.
Extended pycisTarget outputs with additional exported tables for motif hits and cistromes.
Extended GREAT result tables with associated query regions and annotated genes for significant terms.
Added helpers/features_to_bed.py, a helper script for converting feature ID lists into BED files by mapping IDs to genomic coordinates from an annotation table.
Added schema validation for workflow configuration and annotation files in workflow/schemas/.

Changes

Removed the previous heatmap-based group summary outputs, now it is only bubble plots.
Removed filtered aggregate _sig.csv outputs; group-level aggregation now focuses on the complete _all.csv table and summary plots.
Updated the example configuration and test setup. The default config now runs a minimal test spanning all workflow functionalities. The data for this test is restored via test/setup_test_resources.sh.

Bug Fixes

Improved handling of empty BED, query, and background inputs in region-based workflows.
Fixed coordinate convention mismatches between BED inputs and GREAT-derived region-gene association outputs. Adopted exporting coordinates in BED-style indexing as the convention.
Improved aggregation and summary plotting for sparse result sets, including very small matrices and empty outputs.
Added stable fallback behavior for empty enrichment and summary plots so reporting remains informative when no significant results are found.

Documentation

Expanded guidance on skipping selected enrichment tools when they are not needed.
Added example Snakemake rule templates for downloading common enrichment resources, including gene set databases, cisTarget resources, and LOLA region databases in /helpers/database_download_rules.md.
Updated the README to describe the current output structure, BED indexing conventions, and new summary plot outputs.

Infrastructure

Added GitHub Actions for CI, container image generation, and conda environment pinning.
Added Snakemake containerization support.

What's Changed

Add summary plot for specific terms & remove summary heatmaps. by @bednarsky in #37
accept empty bed files, inform users for empty plots, tested the top-n plot for all softwares, handles the +1 shift of Irange ok by @martin5555555555 in #40
Pycis and Great retrieval, download rules helpers, config validation schema by @martin5555555555 in #42
new tested wget download rules by @martin5555555555 in #46
Refactor plot messages and annotation loading by @martin5555555555 in #48
Test data and github action yaml by @martin5555555555 in #47
Containerize by @martin5555555555 in #49
Containerize + Pinning envs by @martin5555555555 in #50
Update container.yaml to setup resources by @martin5555555555 in #51
Still containerize by @martin5555555555 in #52
new secret by @martin5555555555 in #53
Pycis env version + Pin action by @martin5555555555 in #54
Update pin-conda-envs.yaml by @martin5555555555 in #55
Update conda env pins by @github-actions[bot] in #56
Update container.yaml by @martin5555555555 in #57
update pycisTarget env by @martin5555555555 in #59
Update conda env pins by @github-actions[bot] in #60
Update pycisTarget.yaml by @martin5555555555 in #61

New Contributors

@martin5555555555 made their first contribution in #40
@github-actions[bot] made their first contribution in #56

Full Changelog: v2.0.3...v2.1.0

Contributors

martin5555555555 and bednarsky

Assets 2

25 Jun 17:59

sreichl

v2.0.3

394de99

v2.0.3 - minor improvement

Full Changelog: v2.0.2...v2.0.3

Assets 2

27 May 15:17

sreichl

v2.0.2

c7db298

v2.0.2 - Minor fixes

Make all resource files input to rules

Full Changelog: v2.0.1...v2.0.2

Assets 2

20 Dec 14:29

sreichl

v2.0.1

6be93bc

v2.0.1 - enable module usage using `github()` directive

to enable module usage using github() directive
- source utils.R via paramsinstead ofsnakemake@source`
- comment global.yaml (now requires full snakemake installation, not minimal)
add nodefaults to all env YAML and comment global.env usage
fix stringi version

What's Changed

Fixing stringi version to fix env by @bednarsky in #27

New Contributors

@bednarsky made their first contribution in #27

Full Changelog: v2.0.0...v2.0.1

Contributors

bednarsky

Assets 2

13 Sep 13:39

sreichl

v2.0.0

e8a14b0

v2.0.0 - Snakemake 8 compatible

Breaking change: Requires Snakemake >= v8.20.1

Full Changelog: v1.0.1...v2.0.0

Assets 2

07 Jul 14:27

sreichl

v1.0.1

03b64cd

v1.0.1 - bug fixes and exception handling

Bug fixes and exception handling.

Full Changelog: v1.0.0...v1.0.1

Assets 2

12 Jun 17:04

sreichl

v1.0.0

60dfa07

v1.0.0 - stable version with new features, complete docs and examples

Features

Enrichment Analysis Methods:
- Region Set Analysis:
  - LOLA: Genomic Locus Overlap Enrichment Analysis.
  - GREAT: Genomic Regions Enrichment of Annotations Tool using rGREAT.
  - pycisTarget: Motif enrichment analysis in region sets to identify high-confidence transcription factor (TF) cistromes.
- Gene Set Analysis:
  - Over-representation Analysis (ORA): Using GSEApy's enrich() function.
  - RcisTarget: Motif enrichment analysis in gene sets to identify high-confidence TF cistromes.
- Region-based Gene Set Analysis:
  - Region-gene associations obtained using (r)GREAT.
  - Complementary ORA using GSEApy and TFBS motif enrichment analysis using RcisTarget.
- Preranked Gene Set Analysis:
  - Preranked GSEA using GSEApy's prerank() function.
Database Support:
- Local databases for GSEApy and (r)GREAT
  - GMT files e.g., from MSigDB or Enrichr.
  - (custom) JSON file support.
- LOLA databases from LOLA Region Databases or custom created.
- cisTarget databases for pycisTarget and RcisTarget.
Group Aggregation:
- Aggregation of results per method and database.
- Filtered aggregation retaining only statistically significant terms.
Visualization:
- Enrichment dot plots for each query, method, and database combination.
- Hierarchically clustered heatmaps and bubble plots for group summaries.

Documentation

Usage Instructions:
- Steps to download relevant databases and configure the analysis.
- Commands for running the workflow and generating reports.
Examples: Provided example queries and databases with instructions for running a complete analysis.
Links and Resources:
- GitHub repository, Zenodo repository, and Snakemake Workflow Catalog entry.
- Recommended compatible MR.PARETO modules for upstream processing and analyses.
- Web versions of some tools and databases for region/gene sets.

Beware: All packages got updated/changed to their latest versions, therefore results might differ. If possible, rerunning is recommended. The workflow expanded its functionality significantly, hence many changes were introduced especially in the configuration.

Thanks to early adopters @dariarom94, @Rubbert, and @bednarsky for testing and providing constructive feedback.

Bug fixes and performance improvements are not mentioned.

Full Changelog: v0.1.1...v1.0.0

Contributors

Rubbert, dariarom94, and bednarsky

Assets 2

08 Apr 14:00

sreichl

v0.1.1

e1a7e2d

v0.1.1 - small improvements, documentation and citation information

What's Changed

Create LICENSE by @sreichl in #5

Full Changelog: v0.1.0...v0.1.1

Contributors

sreichl

Assets 2

15 Jan 13:11

sreichl

v0.1.0

969319a

v0.1.0 - stable version with complete docs and examples

features

enrichment analysis methods
- region-sets
  - LOLA: Genomic Locus Overlap Enrichment Analysis is run locally.
  - GREAT using rGREAT: Genomic Regions Enrichment of Annotations Tool is queried remotely (requires a working internet connection).
- gene-sets
  - over-representation analysis (ORA) using GSEApy enrich() function performs Fisher’s exact test (i.e., hypergeometric test) and is run locally.
  - preranked gene-set enrichment analysis (preranked GSEA) using GSEApy prerank() function performs preranked GSEA and is run locally.

Note: All genomic region sets are subjected to gene-set ORA, leveraging region-gene associations of each query, and background region-set obtained using GREAT. Thereby, an extended region-set enrichment perspective can be gained by querying databases, that are not supported by region-based tools.

resources (databases) for both gene-based analyses are either downloaded (Enrichr) or copied from local JSON or GMT files.
- all Enrichr databases can be queried (enrichr_dbs).
- local JSON database files can be queried (local_json_dbs).
- local GMT database files (e.g., from MSigDB) can be queried (local_gmt_dbs).
group aggregation of results per method and database
- results of all queries belonging to the same group are aggregated per method and database.
- a filtered version taking the union of all statistically significant terms per query is also saved.
visualization
- region/gene-set specific enrichment dot plots are generated for each query, method, and database combination where the top terms are ranked (along the y-axis) by the mean rank of statistical significance, effect-size, and overlap with the goal to make the results more balanced and interpretable.
- group summary/overview
  - the union of the most significant terms per query, method, and database within a group is determined.
  - their effect-size and statistical significance are visualized as hierarchically clustered heatmaps.
  - a hierarchically clustered bubble plot encoding both effect-size and significance is provided.

docuemntation

complete documentation of used software, all features, and methods
a minimal example to test all supported features
external resources

Assets 2

Uh oh!

Releases: epigen/enrichment_analysis

v3.0.0 - Configurable GREAT region annotation and smaller aggregate outputs

Features (breaking change)

Changes

Documentation

What's Changed

Contributors

Uh oh!

v2.1.0 - Additional genomic regions analysis outputs and new summary visualization

Features

Changes

Bug Fixes

Documentation

Infrastructure

What's Changed

New Contributors

Contributors

Uh oh!

v2.0.3 - minor improvement

Uh oh!

v2.0.2 - Minor fixes

Uh oh!

v2.0.1 - enable module usage using `github()` directive

What's Changed

New Contributors

Contributors

Uh oh!

v2.0.0 - Snakemake 8 compatible

Uh oh!

v1.0.1 - bug fixes and exception handling

Uh oh!

v1.0.0 - stable version with new features, complete docs and examples

Features

Documentation

Contributors

Uh oh!

v0.1.1 - small improvements, documentation and citation information

What's Changed

Contributors

Uh oh!

v0.1.0 - stable version with complete docs and examples

Uh oh!