scverse · oieretxezarreta · May 12, 2026 · May 12, 2026 · May 12, 2026 · May 12, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,6 +9,7 @@ to [Semantic Versioning]. The full commit history is available in the [commit lo
 
 #### Added
 
+- Add {mod}`scvi.external.harreman` for inference of metabolic exchanges in tissues using spatial transcriptomics {pr}`XXXX`.
 - Add support for Python 3.14, {pr}`3563`.
 - Add support for Pandas3, {pr}`3638`.
 

diff --git a/docs/tutorials/index_spatial.md b/docs/tutorials/index_spatial.md
@@ -3,6 +3,7 @@
 ```{toctree}
 :maxdepth: 1
 
+notebooks/spatial/harreman_tutorial
 notebooks/spatial/resolVI_tutorial
 notebooks/spatial/scVIVA_tutorial
 notebooks/spatial/DestVI_tutorial
@@ -12,6 +13,13 @@ notebooks/spatial/stereoscope_heart_LV_tutorial
 notebooks/spatial/cell2location_lymph_node_spatial_tutorial
 ```
 
+```{customcard}
+:path: notebooks/spatial/harreman_tutorial
+:tags: Analysis, Spatial-statistics, Metabolic-inference
+
+Infer metabolic exchanges in tissues using spatial transcriptomics with Harreman
+```
+
 ```{customcard}
 :path: notebooks/spatial/resolVI_tutorial
 :tags: Analysis, Integration, Transfer-learning, Dimensionality-reduction

diff --git a/docs/user_guide/models/harreman.md b/docs/user_guide/models/harreman.md
@@ -0,0 +1,125 @@
+# Harreman
+
+**Harreman** (`scvi.external.harreman`) is a toolkit for inferring metabolic exchanges and cell-cell communication in tissues using spatial transcriptomics data.
+
+The advantages of Harreman are:
+
+- Infers spatially-resolved metabolic gene programs using local autocorrelation
+- Identifies cell-cell metabolic communication and ligand-receptor interactions using spatial proximity graphs
+- Supports multiple spatial technologies (Visium, Slide-seq, and others)
+- Scalable to large spatial datasets
+- Supports both parametric and non-parametric significance testing
+
+The limitations of Harreman include:
+
+- Requires spatial coordinates to be available in `adata.obsm`
+- Cell communication inference requires a ligand-receptor or metabolite transporter database
+
+```{topic} Tutorials:
+
+-   {doc}`/tutorials/notebooks/spatial/harreman_tutorial`
+```
+
+```{topic} External links:
+
+- [Harreman documentation](https://harreman.readthedocs.io)
+- [Harreman GitHub](https://github.com/YosefLab/Harreman)
+```
+
+## Overview
+
+Harreman operates in three main steps:
+
+1. **Spatial graph construction** ({func}`~scvi.external.harreman.tl.compute_knn_graph`): builds a spatial proximity graph from cell coordinates, supporting both k-nearest neighbors and radius-based neighborhoods, with optional Gaussian kernel weighting.
+
+2. **Local autocorrelation** ({func}`~scvi.external.harreman.hs.compute_local_autocorrelation`): identifies spatially variable genes using the local autocorrelation statistic from the Hotspot algorithm (DeTomaso and Yosef, *Cell systems*, 2021), supporting DANB, Bernoulli, and normal count models.
+
+3. **Cell communication** ({func}`~scvi.external.harreman.tl.compute_cell_communication`): infers spatially-resolved metabolic exchanges and ligand-receptor interactions between neighboring cells using HarremanDB and CellChatDB.
+
+## Generative process
+
+At the coarsest level, Harreman partitions the tissue into modules of different metabolic functions based on enzyme co-expression. At the following stage, Harreman formulates hypotheses about which metabolites are exchanged across the tissue or within each spatial zone. Moving to a finer resolution, Harreman can also infer which specific cell subsets participate in the exchange of distinct metabolic activities inside each zone.
+
+For proteins composed of multiple subunits, Harreman computes either an algebraic or geometric mean of the expression values of the corresponding genes:
+
+```{math}
+:nowrap: true
+
+\begin{align}
+    X_{ai} &= \frac{\sum_{l \in S_l} X_{a_li}}{|S_l|}; \quad X_{bj} = \frac{\sum_{r \in S_r} X_{b_rj}}{|S_r|}
+\end{align}
+```
+
+### Test statistic 1: Spatial autocorrelation
+
+Spatially variable genes are identified using the following autocorrelation statistic:
+
+```{math}
+:nowrap: true
+
+\begin{align}
+    H_{a} &= \sum_{i}\sum_{j} w_{ij}X_{ai}X_{aj}
+\end{align}
+```
+
+where $w_{ij}$ represents the communication strength between neighboring cells, computed using a Gaussian kernel:
+
+```{math}
+:nowrap: true
+
+\begin{align}
+    \hat{w}_{ij} &= e^{-d_{ij}^2/\sigma_{i}^2}
+\end{align}
+```
+
+Significance is assessed by converting $H_a$ to a Z-score and adjusting p-values using the Benjamini-Hochberg procedure.
+
+### Test statistic 2: Spatial co-localization
+
+Pairwise spatial correlation between genes is computed as:
+
+```{math}
+:nowrap: true
+
+\begin{align}
+    H_{ab} &= \sum_{i}\sum_{j} w_{ij} \left(X_{ai}X_{bj} + X_{bi}X_{aj}\right)
+\end{align}
+```
+
+This statistic is used to group genes into spatial modules and to identify cell-type-agnostic metabolic exchange events.
+
+### Test statistic 3: Metabolite autocorrelation
+
+Gene-pair results are integrated at the metabolite level:
+
+```{math}
+:nowrap: true
+
+\begin{align}
+    H_{m} &= \sum_{a,b \in m} H_{ab}
+\end{align}
+```
+
+where $m$ is a metabolite exchanged by genes $a$ and $b$.
+
+## Usage
+
+```python
+import scvi.external.harreman as harreman
+
+# 1. Build spatial KNN graph
+harreman.tl.compute_knn_graph(adata, compute_neighbors_on_key="spatial", n_neighbors=10)
+
+# 2. Identify spatially variable genes
+harreman.hs.compute_local_autocorrelation(adata, model="danb")
+
+# 3. Compute pairwise local correlation
+harreman.hs.compute_local_correlation(adata)
+
+# 4. Infer cell-cell communication
+harreman.tl.compute_cell_communication(adata)
+```
+
+## API
+
+Please see {mod}`scvi.external.harreman` for the full API reference.
diff --git a/docs/user_guide/models/index.md b/docs/user_guide/models/index.md
@@ -11,6 +11,7 @@ cytovi
 decipher
 destvi
 gimvi
+harreman
 linearscvi
 methylanvi
 methylvi

diff --git a/pyproject.toml b/pyproject.toml
@@ -55,6 +55,7 @@ dependencies = [
 ]
 
 [project.optional-dependencies]
+harreman = ["pooch"]
 tests = ["pytest", "pytest-pretty", "coverage", "scvi-tools[optional]"]
 editing = ["jupyter", "pre-commit"]
 dev = ["scvi-tools[editing,tests]"]

diff --git a/src/scvi/external/__init__.py b/src/scvi/external/__init__.py
@@ -3,6 +3,7 @@
 from scvi import settings
 from scvi.utils import error_on_missing_dependencies
 
+from . import harreman
 from .cellassign import CellAssign
 from .contrastivevi import ContrastiveVI
 from .cytovi import CYTOVI
@@ -43,6 +44,7 @@
     "RESOLVI",
     "SCVIVA",
     "CYTOVI",
+    "harreman",
 ]
 
 

diff --git a/src/scvi/external/harreman/__init__.py b/src/scvi/external/harreman/__init__.py
@@ -0,0 +1,6 @@
+from . import datasets as ds
+from . import hotspot as hs
+from . import preprocessing as pp
+from . import tools as tl
+
+__all__ = ["ds", "hs", "pp", "tl"]
diff --git a/src/scvi/external/harreman/datasets/__init__.py b/src/scvi/external/harreman/datasets/__init__.py
@@ -0,0 +1 @@
+from .datasets import load_slide_seq_human_lung_dataset, load_visium_mouse_colon_dataset
diff --git a/src/scvi/external/harreman/datasets/datasets.py b/src/scvi/external/harreman/datasets/datasets.py
@@ -0,0 +1,77 @@
+import os
+import tempfile
+
+import scanpy as sc
+
+temp_dir_obj = tempfile.TemporaryDirectory()
+
+
+def load_visium_mouse_colon_dataset(
+    sample: str | None = None,
+) -> "sc.AnnData":
+    """
+    Load the mouse colon 10x Visium dataset.
+
+    Returns
+    -------
+    adata : AnnData
+        The loaded 10x Visium dataset.
+    """
+    dataset_prefix = "Parigi_et_al_mouse_colon"
+
+    samples_path_dict = {
+        "d0": "https://figshare.com/ndownloader/files/59325113",
+        "d14": "https://figshare.com/ndownloader/files/59325116",
+    }
+
+    if sample:
+        if sample not in samples_path_dict.keys():
+            raise ValueError(f'"sample" needs to be one of: {list(samples_path_dict.keys())}')
+        else:
+            adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}_{sample}.h5ad")
+            backup_url = samples_path_dict[sample]
+    else:
+        adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}_unrolled.h5ad")
+        backup_url = "https://figshare.com/ndownloader/files/59325119"
+
+    adata = sc.read(adata_path, backup_url=backup_url)
+
+    return adata
+
+
+def load_slide_seq_human_lung_dataset(
+    sample: str | None = None,
+) -> "sc.AnnData":
+    """
+    Load the human lung Slide-seq dataset.
+
+    Returns
+    -------
+    adata : AnnData
+        The loaded Slide-seq dataset.
+    """
+    dataset_prefix = "Liu_et_al_human_lung"
+
+    samples_path_dict = {
+        "Puck_200727_08": "https://figshare.com/ndownloader/files/59325098",
+        "Puck_200727_09": "https://figshare.com/ndownloader/files/59325092",
+        "Puck_200727_10": "https://figshare.com/ndownloader/files/59325095",
+        "Puck_220408_13": "https://figshare.com/ndownloader/files/59325101",
+        "Puck_220408_14": "https://figshare.com/ndownloader/files/59325104",
+        "Puck_220408_15": "https://figshare.com/ndownloader/files/59325107",
+        "Puck_220408_20": "https://figshare.com/ndownloader/files/59325110",
+    }
+
+    if sample:
+        if sample not in samples_path_dict.keys():
+            raise ValueError(f'"sample" needs to be one of: {list(samples_path_dict.keys())}')
+        else:
+            adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}_{sample}.h5ad")
+            backup_url = samples_path_dict[sample]
+    else:
+        adata_path = os.path.join(temp_dir_obj.name, f"{dataset_prefix}.h5ad")
+        backup_url = "https://figshare.com/ndownloader/files/59325125"
+
+    adata = sc.read(adata_path, backup_url=backup_url)
+
+    return adata
diff --git a/src/scvi/external/harreman/hotspot/__init__.py b/src/scvi/external/harreman/hotspot/__init__.py
@@ -0,0 +1,9 @@
+from .local_autocorrelation import compute_local_autocorrelation, load_metabolic_genes
+from .local_correlation import compute_local_correlation
+from .modules import (
+    calculate_module_scores,
+    calculate_super_module_scores,
+    compute_top_scoring_modules,
+    create_modules,
+    integrate_vision_hotspot_results,
+)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		from .datasets import load_slide_seq_human_lung_dataset, load_visium_mouse_colon_dataset