-
Notifications
You must be signed in to change notification settings - Fork 450
feat: add harreman for metabolic exchange inference in spatial transcriptomics #3806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c9de643
2030fcb
08ee37c
351a8ea
fb2b3f0
c3e1106
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
| # Harreman | ||
|
|
||
| **Harreman** (`scvi.external.harreman`) is a toolkit for inferring metabolic exchanges and cell-cell communication in tissues using spatial transcriptomics data. | ||
|
|
||
| The advantages of Harreman are: | ||
|
|
||
| - Infers spatially-resolved metabolic gene programs using local autocorrelation | ||
| - Identifies cell-cell metabolic communication and ligand-receptor interactions using spatial proximity graphs | ||
| - Supports multiple spatial technologies (Visium, Slide-seq, and others) | ||
| - Scalable to large spatial datasets | ||
| - Supports both parametric and non-parametric significance testing | ||
|
|
||
| The limitations of Harreman include: | ||
|
|
||
| - Requires spatial coordinates to be available in `adata.obsm` | ||
| - Cell communication inference requires a ligand-receptor or metabolite transporter database | ||
|
|
||
| ```{topic} Tutorials: | ||
|
|
||
| - {doc}`/tutorials/notebooks/spatial/harreman_tutorial` | ||
| ``` | ||
|
|
||
| ```{topic} External links: | ||
|
|
||
| - [Harreman documentation](https://harreman.readthedocs.io) | ||
| - [Harreman GitHub](https://github.com/YosefLab/Harreman) | ||
| ``` | ||
|
|
||
| ## Overview | ||
|
|
||
| Harreman operates in three main steps: | ||
|
|
||
| 1. **Spatial graph construction** ({func}`~scvi.external.harreman.tl.compute_knn_graph`): builds a spatial proximity graph from cell coordinates, supporting both k-nearest neighbors and radius-based neighborhoods, with optional Gaussian kernel weighting. | ||
|
|
||
| 2. **Local autocorrelation** ({func}`~scvi.external.harreman.hs.compute_local_autocorrelation`): identifies spatially variable genes using the local autocorrelation statistic from the Hotspot algorithm (DeTomaso and Yosef, *Cell systems*, 2021), supporting DANB, Bernoulli, and normal count models. | ||
|
|
||
| 3. **Cell communication** ({func}`~scvi.external.harreman.tl.compute_cell_communication`): infers spatially-resolved metabolic exchanges and ligand-receptor interactions between neighboring cells using HarremanDB and CellChatDB. | ||
|
|
||
| ## Generative process | ||
|
|
||
| At the coarsest level, Harreman partitions the tissue into modules of different metabolic functions based on enzyme co-expression. At the following stage, Harreman formulates hypotheses about which metabolites are exchanged across the tissue or within each spatial zone. Moving to a finer resolution, Harreman can also infer which specific cell subsets participate in the exchange of distinct metabolic activities inside each zone. | ||
|
|
||
| For proteins composed of multiple subunits, Harreman computes either an algebraic or geometric mean of the expression values of the corresponding genes: | ||
|
|
||
| ```{math} | ||
| :nowrap: true | ||
|
|
||
| \begin{align} | ||
| X_{ai} &= \frac{\sum_{l \in S_l} X_{a_li}}{|S_l|}; \quad X_{bj} = \frac{\sum_{r \in S_r} X_{b_rj}}{|S_r|} | ||
| \end{align} | ||
| ``` | ||
|
|
||
| ### Test statistic 1: Spatial autocorrelation | ||
|
|
||
| Spatially variable genes are identified using the following autocorrelation statistic: | ||
|
|
||
| ```{math} | ||
| :nowrap: true | ||
|
|
||
| \begin{align} | ||
| H_{a} &= \sum_{i}\sum_{j} w_{ij}X_{ai}X_{aj} | ||
| \end{align} | ||
| ``` | ||
|
|
||
| where $w_{ij}$ represents the communication strength between neighboring cells, computed using a Gaussian kernel: | ||
|
|
||
| ```{math} | ||
| :nowrap: true | ||
|
|
||
| \begin{align} | ||
| \hat{w}_{ij} &= e^{-d_{ij}^2/\sigma_{i}^2} | ||
| \end{align} | ||
| ``` | ||
|
|
||
| Significance is assessed by converting $H_a$ to a Z-score and adjusting p-values using the Benjamini-Hochberg procedure. | ||
|
|
||
| ### Test statistic 2: Spatial co-localization | ||
|
|
||
| Pairwise spatial correlation between genes is computed as: | ||
|
|
||
| ```{math} | ||
| :nowrap: true | ||
|
|
||
| \begin{align} | ||
| H_{ab} &= \sum_{i}\sum_{j} w_{ij} \left(X_{ai}X_{bj} + X_{bi}X_{aj}\right) | ||
| \end{align} | ||
| ``` | ||
|
|
||
| This statistic is used to group genes into spatial modules and to identify cell-type-agnostic metabolic exchange events. | ||
|
|
||
| ### Test statistic 3: Metabolite autocorrelation | ||
|
|
||
| Gene-pair results are integrated at the metabolite level: | ||
|
|
||
| ```{math} | ||
| :nowrap: true | ||
|
|
||
| \begin{align} | ||
| H_{m} &= \sum_{a,b \in m} H_{ab} | ||
| \end{align} | ||
| ``` | ||
|
|
||
| where $m$ is a metabolite exchanged by genes $a$ and $b$. | ||
|
|
||
| ## Usage | ||
|
|
||
| ```python | ||
| import scvi.external.harreman as harreman | ||
|
|
||
| # 1. Build spatial KNN graph | ||
| harreman.tl.compute_knn_graph(adata, compute_neighbors_on_key="spatial", n_neighbors=10) | ||
|
|
||
| # 2. Identify spatially variable genes | ||
| harreman.hs.compute_local_autocorrelation(adata, model="danb") | ||
|
|
||
| # 3. Compute pairwise local correlation | ||
| harreman.hs.compute_local_correlation(adata) | ||
|
|
||
| # 4. Infer cell-cell communication | ||
| harreman.tl.compute_cell_communication(adata) | ||
| ``` | ||
|
|
||
| ## API | ||
|
|
||
| Please see {mod}`scvi.external.harreman` for the full API reference. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,7 @@ cytovi | |
| decipher | ||
| destvi | ||
| gimvi | ||
| harreman | ||
| linearscvi | ||
| methylanvi | ||
| methylvi | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -55,6 +55,7 @@ dependencies = [ | |
| ] | ||
|
|
||
| [project.optional-dependencies] | ||
| harreman = ["pooch"] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no need, we install pooch in "file_sharing" namespace. and its mainly for tests/tutorial. if need as a dependency tell me |
||
| tests = ["pytest", "pytest-pretty", "coverage", "scvi-tools[optional]"] | ||
| editing = ["jupyter", "pre-commit"] | ||
| dev = ["scvi-tools[editing,tests]"] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,6 +3,7 @@ | |
| from scvi import settings | ||
| from scvi.utils import error_on_missing_dependencies | ||
|
|
||
| from . import harreman | ||
| from .cellassign import CellAssign | ||
| from .contrastivevi import ContrastiveVI | ||
| from .cytovi import CYTOVI | ||
|
|
@@ -43,6 +44,7 @@ | |
| "RESOLVI", | ||
| "SCVIVA", | ||
| "CYTOVI", | ||
| "harreman", | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please change to at least capital H at the beginning of Harreman |
||
| ] | ||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| from . import datasets as ds | ||
| from . import hotspot as hs | ||
| from . import preprocessing as pp | ||
| from . import tools as tl | ||
|
|
||
| __all__ = ["ds", "hs", "pp", "tl"] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| from .datasets import load_slide_seq_human_lung_dataset, load_visium_mouse_colon_dataset |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| import os | ||
| import tempfile | ||
|
|
||
| import scanpy as sc | ||
|
|
||
| temp_dir_obj = tempfile.TemporaryDirectory() | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Created at import time. Lifetime is tied to module GC, not to the function call. this is suitable for tutorials, not codebase of package. All downloaded datasets silently land in the same temp dir and are deleted unpredictably. Use pooch's like in other function we have |
||
|
|
||
|
|
||
| def load_visium_mouse_colon_dataset( | ||
| sample: str | None = None, | ||
| ) -> "sc.AnnData": | ||
| """ | ||
| Load the mouse colon 10x Visium dataset. | ||
|
|
||
| Returns | ||
| ------- | ||
| adata : AnnData | ||
| The loaded 10x Visium dataset. | ||
| """ | ||
| dataset_prefix = "Parigi_et_al_mouse_colon" | ||
|
|
||
| samples_path_dict = { | ||
| "d0": "https://figshare.com/ndownloader/files/59325113", | ||
| "d14": "https://figshare.com/ndownloader/files/59325116", | ||
| } | ||
|
|
||
| if sample: | ||
| if sample not in samples_path_dict.keys(): | ||
| raise ValueError(f'"sample" needs to be one of: {list(samples_path_dict.keys())}') | ||
| else: | ||
| adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}_{sample}.h5ad") | ||
| backup_url = samples_path_dict[sample] | ||
| else: | ||
| adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}_unrolled.h5ad") | ||
| backup_url = "https://figshare.com/ndownloader/files/59325119" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so we wont be able to use figshare anymore - need to move to sverse s3 and store there. also scanpy backup_url is ok for tutorials but pooch is prefered also here. |
||
|
|
||
| adata = sc.read(adata_path, backup_url=backup_url) | ||
|
|
||
| return adata | ||
|
|
||
|
|
||
| def load_slide_seq_human_lung_dataset( | ||
| sample: str | None = None, | ||
| ) -> "sc.AnnData": | ||
| """ | ||
| Load the human lung Slide-seq dataset. | ||
|
|
||
| Returns | ||
| ------- | ||
| adata : AnnData | ||
| The loaded Slide-seq dataset. | ||
| """ | ||
| dataset_prefix = "Liu_et_al_human_lung" | ||
|
|
||
| samples_path_dict = { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need to move everything to scverse s3 like we did with rest of external metadata we use |
||
| "Puck_200727_08": "https://figshare.com/ndownloader/files/59325098", | ||
| "Puck_200727_09": "https://figshare.com/ndownloader/files/59325092", | ||
| "Puck_200727_10": "https://figshare.com/ndownloader/files/59325095", | ||
| "Puck_220408_13": "https://figshare.com/ndownloader/files/59325101", | ||
| "Puck_220408_14": "https://figshare.com/ndownloader/files/59325104", | ||
| "Puck_220408_15": "https://figshare.com/ndownloader/files/59325107", | ||
| "Puck_220408_20": "https://figshare.com/ndownloader/files/59325110", | ||
| } | ||
|
|
||
| if sample: | ||
| if sample not in samples_path_dict.keys(): | ||
| raise ValueError(f'"sample" needs to be one of: {list(samples_path_dict.keys())}') | ||
| else: | ||
| adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}_{sample}.h5ad") | ||
| backup_url = samples_path_dict[sample] | ||
| else: | ||
| adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}.h5ad") | ||
| backup_url = "https://figshare.com/ndownloader/files/59325125" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as above: |
||
|
|
||
| adata = sc.read(adata_path, backup_url=backup_url) | ||
|
|
||
| return adata | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you have a local copy of Hotspot? isnt just using the latest from pip enough? if not, why not update and release a new version of hotspot in pip, so that Harreman and others will use? (major code reuse)
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dont think we need all hotspot to be added to scvi-tools (it has its own pakcage for that) - just the parts that Harreman needs |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| from .local_autocorrelation import compute_local_autocorrelation, load_metabolic_genes | ||
| from .local_correlation import compute_local_correlation | ||
| from .modules import ( | ||
| calculate_module_scores, | ||
| calculate_super_module_scores, | ||
| compute_top_scoring_modules, | ||
| create_modules, | ||
| integrate_vision_hotspot_results, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw 4 tutorials in harreman docs, is that the only one you consider to add?