Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ to [Semantic Versioning]. The full commit history is available in the [commit lo

#### Added

- Add {mod}`scvi.external.harreman` for inference of metabolic exchanges in tissues using spatial transcriptomics {pr}`XXXX`.
- Add support for Python 3.14, {pr}`3563`.
- Add support for Pandas3, {pr}`3638`.

Expand Down
8 changes: 8 additions & 0 deletions docs/tutorials/index_spatial.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
```{toctree}
:maxdepth: 1

notebooks/spatial/harreman_tutorial
notebooks/spatial/resolVI_tutorial
notebooks/spatial/scVIVA_tutorial
notebooks/spatial/DestVI_tutorial
Expand All @@ -12,6 +13,13 @@ notebooks/spatial/stereoscope_heart_LV_tutorial
notebooks/spatial/cell2location_lymph_node_spatial_tutorial
```

```{customcard}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw 4 tutorials in harreman docs, is that the only one you consider to add?

:path: notebooks/spatial/harreman_tutorial
:tags: Analysis, Spatial-statistics, Metabolic-inference

Infer metabolic exchanges in tissues using spatial transcriptomics with Harreman
```

```{customcard}
:path: notebooks/spatial/resolVI_tutorial
:tags: Analysis, Integration, Transfer-learning, Dimensionality-reduction
Expand Down
125 changes: 125 additions & 0 deletions docs/user_guide/models/harreman.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Harreman

**Harreman** (`scvi.external.harreman`) is a toolkit for inferring metabolic exchanges and cell-cell communication in tissues using spatial transcriptomics data.

The advantages of Harreman are:

- Infers spatially-resolved metabolic gene programs using local autocorrelation
- Identifies cell-cell metabolic communication and ligand-receptor interactions using spatial proximity graphs
- Supports multiple spatial technologies (Visium, Slide-seq, and others)
- Scalable to large spatial datasets
- Supports both parametric and non-parametric significance testing

The limitations of Harreman include:

- Requires spatial coordinates to be available in `adata.obsm`
- Cell communication inference requires a ligand-receptor or metabolite transporter database

```{topic} Tutorials:

- {doc}`/tutorials/notebooks/spatial/harreman_tutorial`
```

```{topic} External links:

- [Harreman documentation](https://harreman.readthedocs.io)
- [Harreman GitHub](https://github.com/YosefLab/Harreman)
```

## Overview

Harreman operates in three main steps:

1. **Spatial graph construction** ({func}`~scvi.external.harreman.tl.compute_knn_graph`): builds a spatial proximity graph from cell coordinates, supporting both k-nearest neighbors and radius-based neighborhoods, with optional Gaussian kernel weighting.

2. **Local autocorrelation** ({func}`~scvi.external.harreman.hs.compute_local_autocorrelation`): identifies spatially variable genes using the local autocorrelation statistic from the Hotspot algorithm (DeTomaso and Yosef, *Cell systems*, 2021), supporting DANB, Bernoulli, and normal count models.

3. **Cell communication** ({func}`~scvi.external.harreman.tl.compute_cell_communication`): infers spatially-resolved metabolic exchanges and ligand-receptor interactions between neighboring cells using HarremanDB and CellChatDB.

## Generative process

At the coarsest level, Harreman partitions the tissue into modules of different metabolic functions based on enzyme co-expression. At the following stage, Harreman formulates hypotheses about which metabolites are exchanged across the tissue or within each spatial zone. Moving to a finer resolution, Harreman can also infer which specific cell subsets participate in the exchange of distinct metabolic activities inside each zone.

For proteins composed of multiple subunits, Harreman computes either an algebraic or geometric mean of the expression values of the corresponding genes:

```{math}
:nowrap: true

\begin{align}
X_{ai} &= \frac{\sum_{l \in S_l} X_{a_li}}{|S_l|}; \quad X_{bj} = \frac{\sum_{r \in S_r} X_{b_rj}}{|S_r|}
\end{align}
```

### Test statistic 1: Spatial autocorrelation

Spatially variable genes are identified using the following autocorrelation statistic:

```{math}
:nowrap: true

\begin{align}
H_{a} &= \sum_{i}\sum_{j} w_{ij}X_{ai}X_{aj}
\end{align}
```

where $w_{ij}$ represents the communication strength between neighboring cells, computed using a Gaussian kernel:

```{math}
:nowrap: true

\begin{align}
\hat{w}_{ij} &= e^{-d_{ij}^2/\sigma_{i}^2}
\end{align}
```

Significance is assessed by converting $H_a$ to a Z-score and adjusting p-values using the Benjamini-Hochberg procedure.

### Test statistic 2: Spatial co-localization

Pairwise spatial correlation between genes is computed as:

```{math}
:nowrap: true

\begin{align}
H_{ab} &= \sum_{i}\sum_{j} w_{ij} \left(X_{ai}X_{bj} + X_{bi}X_{aj}\right)
\end{align}
```

This statistic is used to group genes into spatial modules and to identify cell-type-agnostic metabolic exchange events.

### Test statistic 3: Metabolite autocorrelation

Gene-pair results are integrated at the metabolite level:

```{math}
:nowrap: true

\begin{align}
H_{m} &= \sum_{a,b \in m} H_{ab}
\end{align}
```

where $m$ is a metabolite exchanged by genes $a$ and $b$.

## Usage

```python
import scvi.external.harreman as harreman

# 1. Build spatial KNN graph
harreman.tl.compute_knn_graph(adata, compute_neighbors_on_key="spatial", n_neighbors=10)

# 2. Identify spatially variable genes
harreman.hs.compute_local_autocorrelation(adata, model="danb")

# 3. Compute pairwise local correlation
harreman.hs.compute_local_correlation(adata)

# 4. Infer cell-cell communication
harreman.tl.compute_cell_communication(adata)
```

## API

Please see {mod}`scvi.external.harreman` for the full API reference.
1 change: 1 addition & 0 deletions docs/user_guide/models/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ cytovi
decipher
destvi
gimvi
harreman
linearscvi
methylanvi
methylvi
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ dependencies = [
]

[project.optional-dependencies]
harreman = ["pooch"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need, we install pooch in "file_sharing" namespace. and its mainly for tests/tutorial. if need as a dependency tell me

tests = ["pytest", "pytest-pretty", "coverage", "scvi-tools[optional]"]
editing = ["jupyter", "pre-commit"]
dev = ["scvi-tools[editing,tests]"]
Expand Down
2 changes: 2 additions & 0 deletions src/scvi/external/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from scvi import settings
from scvi.utils import error_on_missing_dependencies

from . import harreman
from .cellassign import CellAssign
from .contrastivevi import ContrastiveVI
from .cytovi import CYTOVI
Expand Down Expand Up @@ -43,6 +44,7 @@
"RESOLVI",
"SCVIVA",
"CYTOVI",
"harreman",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to at least capital H at the beginning of Harreman

]


Expand Down
6 changes: 6 additions & 0 deletions src/scvi/external/harreman/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from . import datasets as ds
from . import hotspot as hs
from . import preprocessing as pp
from . import tools as tl

__all__ = ["ds", "hs", "pp", "tl"]
1 change: 1 addition & 0 deletions src/scvi/external/harreman/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .datasets import load_slide_seq_human_lung_dataset, load_visium_mouse_colon_dataset
77 changes: 77 additions & 0 deletions src/scvi/external/harreman/datasets/datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import os
import tempfile

import scanpy as sc

temp_dir_obj = tempfile.TemporaryDirectory()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created at import time. Lifetime is tied to module GC, not to the function call. this is suitable for tutorials, not codebase of package. All downloaded datasets silently land in the same temp dir and are deleted unpredictably. Use pooch's like in other function we have



def load_visium_mouse_colon_dataset(
sample: str | None = None,
) -> "sc.AnnData":
"""
Load the mouse colon 10x Visium dataset.

Returns
-------
adata : AnnData
The loaded 10x Visium dataset.
"""
dataset_prefix = "Parigi_et_al_mouse_colon"

samples_path_dict = {
"d0": "https://figshare.com/ndownloader/files/59325113",
"d14": "https://figshare.com/ndownloader/files/59325116",
}

if sample:
if sample not in samples_path_dict.keys():
raise ValueError(f'"sample" needs to be one of: {list(samples_path_dict.keys())}')
else:
adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}_{sample}.h5ad")
backup_url = samples_path_dict[sample]
else:
adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}_unrolled.h5ad")
backup_url = "https://figshare.com/ndownloader/files/59325119"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we wont be able to use figshare anymore - need to move to sverse s3 and store there. also scanpy backup_url is ok for tutorials but pooch is prefered also here.


adata = sc.read(adata_path, backup_url=backup_url)

return adata


def load_slide_seq_human_lung_dataset(
sample: str | None = None,
) -> "sc.AnnData":
"""
Load the human lung Slide-seq dataset.

Returns
-------
adata : AnnData
The loaded Slide-seq dataset.
"""
dataset_prefix = "Liu_et_al_human_lung"

samples_path_dict = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to move everything to scverse s3 like we did with rest of external metadata we use

"Puck_200727_08": "https://figshare.com/ndownloader/files/59325098",
"Puck_200727_09": "https://figshare.com/ndownloader/files/59325092",
"Puck_200727_10": "https://figshare.com/ndownloader/files/59325095",
"Puck_220408_13": "https://figshare.com/ndownloader/files/59325101",
"Puck_220408_14": "https://figshare.com/ndownloader/files/59325104",
"Puck_220408_15": "https://figshare.com/ndownloader/files/59325107",
"Puck_220408_20": "https://figshare.com/ndownloader/files/59325110",
}

if sample:
if sample not in samples_path_dict.keys():
raise ValueError(f'"sample" needs to be one of: {list(samples_path_dict.keys())}')
else:
adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}_{sample}.h5ad")
backup_url = samples_path_dict[sample]
else:
adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}.h5ad")
backup_url = "https://figshare.com/ndownloader/files/59325125"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above:
so we wont be able to use figshare anymore - need to move to sverse s3 and store there. also scanpy backup_url is ok for tutorials but pooch is prefered also here.


adata = sc.read(adata_path, backup_url=backup_url)

return adata
9 changes: 9 additions & 0 deletions src/scvi/external/harreman/hotspot/__init__.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have a local copy of Hotspot? isnt just using the latest from pip enough? if not, why not update and release a new version of hotspot in pip, so that Harreman and others will use? (major code reuse)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need all hotspot to be added to scvi-tools (it has its own pakcage for that) - just the parts that Harreman needs

Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from .local_autocorrelation import compute_local_autocorrelation, load_metabolic_genes
from .local_correlation import compute_local_correlation
from .modules import (
calculate_module_scores,
calculate_super_module_scores,
compute_top_scoring_modules,
create_modules,
integrate_vision_hotspot_results,
)
Loading