Skip to content

Add BEYOND workflow#1162

Open
mdgrv wants to merge 12 commits into
openpipelines-bio:mainfrom
mdgrv:beyond
Open

Add BEYOND workflow#1162
mdgrv wants to merge 12 commits into
openpipelines-bio:mainfrom
mdgrv:beyond

Conversation

@mdgrv

@mdgrv mdgrv commented Apr 10, 2026

Copy link
Copy Markdown

Changelog

  • BEYOND methodology — 8 new components and 2 new workflows implementing the
    Habib-lab BEYOND pipeline for cellular community discovery in snRNA-seq data
    (reference: naomihabiblab/BEYOND_DLPFC):

    Components:

    • metadata/calculate_proportions: computes a participant × subpopulation cell
      proportion matrix from a single-cell atlas; stores results in .uns["proportions"]
      (column-first dict) and .obsm["proportions"] (per-cell proportion vectors).

    • dimred/phate: computes a PHATE embedding from any .obsm matrix (e.g. X_pca
      or proportion vectors); stores result in .obsm["X_phate"]. Supports configurable
      knn, decay, t, gamma, and n_components.

    • trajectory/palantir: computes pseudotime and fate probabilities using Palantir
      (1.3.3 API); stores results in obs["palantir_pseudotime"], obs["palantir_entropy"],
      obsm["palantir_fate_probabilities"], and uns["palantir_waypoints"]. Supports
      automatic start-cell selection from a cluster label or an explicit barcode.

    • trajectory/via: computes pseudotime using the VIA graph-based algorithm; stores
      results in obs["via_pseudotime"] and uns["via_graph"]. Accepts any .obsm key
      as input embedding and a cluster label or integer index as trajectory root.

    • trajectory/pseudotime_dynamics: fits a cubic spline (scipy.interpolate.UnivariateSpline)
      of participant proportion versus pseudotime per subpopulation; stores fitted curves,
      peak pseudotime, R², and p-value in .uns["dynamics"].

    • cluster/cellular_communities: detects cellular communities by combining
      co-occurrence similarity (Pearson correlation of participant proportion vectors) and
      dynamics similarity (Pearson correlation of fitted proportion curves); applies
      hierarchical (Ward) or spectral clustering; stores community labels in
      obs["community_id"] and full metadata in uns["cellular_communities"].

    • interpret/pathway_enrichment: performs pre-ranked GSEA or ORA on DESeq2 results
      using GSEApy; supports Enrichr library names or custom GMT files; stores results in
      .uns["pathway_enrichment"] and writes per-library CSV files.

    • stats/trait_associations (new namespace): tests associations between
      subpopulation proportions and clinical/biological traits using linear mixed models
      (statsmodels MixedLM) or OLS when no random effect is specified; applies
      BH/Bonferroni FDR correction across all (subpopulation, trait) pairs; stores results
      in .uns["trait_associations"] and optionally writes a CSV.

    Workflows:

    • workflows/beyond/atlas_building: end-to-end atlas construction from per-donor
      h5mu files — QC filtering, integration (Harmony), cell-type annotation (CellTypist),
      and per-cell-type Leiden subclustering to produce the subpopulation labels required
      by the trajectory analysis workflow.

    • workflows/beyond/trajectory_analysis: full BEYOND trajectory inference from an
      annotated atlas h5mu — runs all 8 steps (proportions → PHATE → Palantir → VIA →
      pseudotime dynamics → cellular communities → trait associations → pathway enrichment)
      and emits a single enriched h5mu with all results.

Issue ticket number and link

Closes #1161

Checklist before requesting a review

  • I have performed a self-review of my code

  • Conforms to the Contributor's guide

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Documentation
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI tests succeed!

mdgrv and others added 9 commits March 16, 2026 14:51
- Add metadata/calculate_proportions: participant × subpopulation proportion matrix
- Add interpret/pathway_enrichment: pre-ranked GSEA/ORA via GSEApy
- Add workflows/beyond/atlas_building: Nextflow workflow scaffold
- Add resources_test_scripts/beyond_test_data.sh: synthetic test data generator
- Add src/authors/marilyn_degraeve.yaml: author entry
- Update src/interpret/lianapy/config.vsh.yaml: update author to marilyn_degraeve
- Add BEYOND_plan.md, summary_docs.md, tasks.md: planning and reference docs
- Update CHANGELOG.md with new components

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements PHATE (Potential of Heat-diffusion for Affinity-based
Transition Embedding) as a Viash component in the dimred namespace.
Takes any .obsm matrix as input (X_pca or proportion vectors from
metadata/calculate_proportions) and stores the embedding in .obsm["X_phate"].

Key parameters: n_components, knn, decay, t (auto or fixed), gamma,
random_state. Uses meta_cpus for n_jobs. Mirrors dimred/umap structure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extend resources_test_scripts/beyond_test_data.sh to generate 3
  intermediate h5mu files: proportions_output, pseudotime_output,
  dynamics_output (simulating calculate_proportions, palantir, and
  pseudotime_dynamics outputs respectively)
- Rewrite test_data_script.sh for 6 components to delegate to
  beyond_test_data.sh and reference shared files; remove inline
  synthetic data generation
- Fix calculate_proportions/test.py: spurious .T on prop_df caused
  shape assertion (3,4) to fail against transposed (4,3) result
- Add new component sources: cluster/cellular_communities,
  stats/trait_associations, trajectory/palantir,
  trajectory/pseudotime_dynamics, trajectory/via

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mdgrv mdgrv marked this pull request as draft April 24, 2026 13:59
@mdgrv mdgrv marked this pull request as ready for review April 24, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add BEYOND workflows

1 participant