Add BEYOND workflow#1162
Open
mdgrv wants to merge 12 commits into
Open
Conversation
- Add metadata/calculate_proportions: participant × subpopulation proportion matrix - Add interpret/pathway_enrichment: pre-ranked GSEA/ORA via GSEApy - Add workflows/beyond/atlas_building: Nextflow workflow scaffold - Add resources_test_scripts/beyond_test_data.sh: synthetic test data generator - Add src/authors/marilyn_degraeve.yaml: author entry - Update src/interpret/lianapy/config.vsh.yaml: update author to marilyn_degraeve - Add BEYOND_plan.md, summary_docs.md, tasks.md: planning and reference docs - Update CHANGELOG.md with new components Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements PHATE (Potential of Heat-diffusion for Affinity-based Transition Embedding) as a Viash component in the dimred namespace. Takes any .obsm matrix as input (X_pca or proportion vectors from metadata/calculate_proportions) and stores the embedding in .obsm["X_phate"]. Key parameters: n_components, knn, decay, t (auto or fixed), gamma, random_state. Uses meta_cpus for n_jobs. Mirrors dimred/umap structure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extend resources_test_scripts/beyond_test_data.sh to generate 3 intermediate h5mu files: proportions_output, pseudotime_output, dynamics_output (simulating calculate_proportions, palantir, and pseudotime_dynamics outputs respectively) - Rewrite test_data_script.sh for 6 components to delegate to beyond_test_data.sh and reference shared files; remove inline synthetic data generation - Fix calculate_proportions/test.py: spurious .T on prop_df caused shape assertion (3,4) to fail against transposed (4,3) result - Add new component sources: cluster/cellular_communities, stats/trait_associations, trajectory/palantir, trajectory/pseudotime_dynamics, trajectory/via Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
in BEYOND configs and scripts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog
BEYOND methodology — 8 new components and 2 new workflows implementing the
Habib-lab BEYOND pipeline for cellular community discovery in snRNA-seq data
(reference: naomihabiblab/BEYOND_DLPFC):
Components:
metadata/calculate_proportions: computes a participant × subpopulation cellproportion matrix from a single-cell atlas; stores results in
.uns["proportions"](column-first dict) and
.obsm["proportions"](per-cell proportion vectors).dimred/phate: computes a PHATE embedding from any.obsmmatrix (e.g.X_pcaor proportion vectors); stores result in
.obsm["X_phate"]. Supports configurableknn,decay,t,gamma, andn_components.trajectory/palantir: computes pseudotime and fate probabilities using Palantir(1.3.3 API); stores results in
obs["palantir_pseudotime"],obs["palantir_entropy"],obsm["palantir_fate_probabilities"], anduns["palantir_waypoints"]. Supportsautomatic start-cell selection from a cluster label or an explicit barcode.
trajectory/via: computes pseudotime using the VIA graph-based algorithm; storesresults in
obs["via_pseudotime"]anduns["via_graph"]. Accepts any.obsmkeyas input embedding and a cluster label or integer index as trajectory root.
trajectory/pseudotime_dynamics: fits a cubic spline (scipy.interpolate.UnivariateSpline)of participant proportion versus pseudotime per subpopulation; stores fitted curves,
peak pseudotime, R², and p-value in
.uns["dynamics"].cluster/cellular_communities: detects cellular communities by combiningco-occurrence similarity (Pearson correlation of participant proportion vectors) and
dynamics similarity (Pearson correlation of fitted proportion curves); applies
hierarchical (Ward) or spectral clustering; stores community labels in
obs["community_id"]and full metadata inuns["cellular_communities"].interpret/pathway_enrichment: performs pre-ranked GSEA or ORA on DESeq2 resultsusing GSEApy; supports Enrichr library names or custom GMT files; stores results in
.uns["pathway_enrichment"]and writes per-library CSV files.stats/trait_associations(new namespace): tests associations betweensubpopulation proportions and clinical/biological traits using linear mixed models
(statsmodels MixedLM) or OLS when no random effect is specified; applies
BH/Bonferroni FDR correction across all (subpopulation, trait) pairs; stores results
in
.uns["trait_associations"]and optionally writes a CSV.Workflows:
workflows/beyond/atlas_building: end-to-end atlas construction from per-donorh5mu files — QC filtering, integration (Harmony), cell-type annotation (CellTypist),
and per-cell-type Leiden subclustering to produce the subpopulation labels required
by the trajectory analysis workflow.
workflows/beyond/trajectory_analysis: full BEYOND trajectory inference from anannotated atlas h5mu — runs all 8 steps (proportions → PHATE → Palantir → VIA →
pseudotime dynamics → cellular communities → trait associations → pathway enrichment)
and emits a single enriched h5mu with all results.
Issue ticket number and link
Closes #1161
Checklist before requesting a review
I have performed a self-review of my code
Conforms to the Contributor's guide
Check the correct box. Does this PR contain:
Proposed changes are described in the CHANGELOG.md
CI tests succeed!