Skip to content

feat: add harreman for metabolic exchange inference in spatial transcriptomics#3806

Open
oieretxezarreta wants to merge 6 commits into
scverse:mainfrom
oieretxezarreta:add-harreman
Open

feat: add harreman for metabolic exchange inference in spatial transcriptomics#3806
oieretxezarreta wants to merge 6 commits into
scverse:mainfrom
oieretxezarreta:add-harreman

Conversation

@oieretxezarreta
Copy link
Copy Markdown

Description

This PR adds Harreman (scvi.external.harreman), a toolkit for inferring
metabolic exchanges in tissues using spatial transcriptomics data.

Changes

  • Add scvi.external.harreman with submodules:
    • tl (tools): KNN graph, cell communication, gene pairs
    • hs (hotspot): local autocorrelation, local correlation, gene modules
    • pp (preprocessing): AnnData setup, interaction database loading
    • ds (datasets): Visium and Slide-seq example datasets
    • pl (plots): visualization utilities
  • Add tests in tests/external/harreman/ (7/7 passing)
  • Add tutorial entry in docs/tutorials/index_spatial.md
  • Add changelog entry under version 1.4.3

Related

@ori-kron-wis ori-kron-wis added the on-merge: backport to 1.4.x on-merge: backport to 1.4.x label May 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 18.24500% with 1882 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.65%. Comparing base (612157b) to head (2030fcb).

Files with missing lines Patch % Lines
src/scvi/external/harreman/plots/plots.py 4.77% 459 Missing ⚠️
src/scvi/external/harreman/hotspot/modules.py 7.36% 365 Missing ⚠️
src/scvi/external/harreman/hotspot/models.py 17.46% 208 Missing ⚠️
...rc/scvi/external/harreman/preprocessing/anndata.py 10.62% 185 Missing ⚠️
src/scvi/external/harreman/tools/api.py 0.00% 153 Missing ⚠️
...cvi/external/harreman/hotspot/local_correlation.py 34.34% 151 Missing ⚠️
...external/harreman/hotspot/local_autocorrelation.py 41.30% 135 Missing ⚠️
src/scvi/external/harreman/tools/knn.py 46.15% 105 Missing ⚠️
...c/scvi/external/harreman/preprocessing/database.py 13.91% 99 Missing ⚠️
src/scvi/external/harreman/datasets/datasets.py 21.42% 22 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (612157b) and HEAD (2030fcb). Click for more details.

HEAD has 21 uploads less than BASE
Flag BASE (612157b) HEAD (2030fcb)
integration 24 3
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3806       +/-   ##
===========================================
- Coverage   88.23%   75.65%   -12.58%     
===========================================
  Files         229      246       +17     
  Lines       22646    27077     +4431     
===========================================
+ Hits        19981    20485      +504     
- Misses       2665     6592     +3927     
Flag Coverage Δ
integration 61.64% <18.24%> (-9.84%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/scvi/external/__init__.py 85.71% <100.00%> (+0.42%) ⬆️
src/scvi/external/harreman/__init__.py 100.00% <100.00%> (ø)
src/scvi/external/harreman/datasets/__init__.py 100.00% <100.00%> (ø)
src/scvi/external/harreman/hotspot/__init__.py 100.00% <100.00%> (ø)
src/scvi/external/harreman/plots/__init__.py 100.00% <100.00%> (ø)
...c/scvi/external/harreman/preprocessing/__init__.py 100.00% <100.00%> (ø)
src/scvi/external/harreman/tools/__init__.py 100.00% <100.00%> (ø)
...scvi/external/harreman/tools/cell_communication.py 3.94% <ø> (ø)
src/scvi/external/harreman/datasets/datasets.py 21.42% <21.42%> (ø)
...c/scvi/external/harreman/preprocessing/database.py 13.91% <13.91%> (ø)
... and 8 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@ori-kron-wis ori-kron-wis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main focus of review:

  1. Logic issues in several functions
  2. Old legacy syntax in the code
  3. Uploading data to scverse S3 and use pooch only
  4. Dead code: many functions, redundant imports
  5. Not all functions are covered in tests
  6. Seems we can do better in object-oriented programming. its all a bunch of functions now with no relation to each other. Classes can help a lot to organize it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have a local copy of Hotspot? isnt just using the latest from pip enough? if not, why not update and release a new version of hotspot in pip, so that Harreman and others will use? (major code reuse)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need all hotspot to be added to scvi-tools (it has its own pakcage for that) - just the parts that Harreman needs

compute_neighbors_from_distances(
adata,
distances_obsp_key,
n_neighbors,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_neighbors_from_distances called with wrong positional args — knn.py:94
Called as: compute_neighbors_from_distances(adata, distances_obsp_key, n_neighbors, sample_key, verbose)
Defined as: def compute_neighbors_from_distances(adata, distances_obsp_key="distances", sample_key=None, verbose=False)
The function accepts 4 args max but receives 5 → TypeError at runtime. n_neighbors is silently passed as sample_key.

if adata.uns.get("deconv_data", False):
if verbose:
print("Adding intra-spot connections...")
spot_diameter = adata.uns["spot_diameter"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that very strict. so you are saying the adata of a user MUST have an und column that is called spot_diameter?or is that inferred automaticaly during the pre process step? (setup_anndata )

cell_type_key: str,
database_varm_key: str,
sample_key: str | None,
spot_diameter: int,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is mandatory (?) switch the place of it and sample_key which can get a None value by default.

if len(inds) <= 1:
continue
for i, j in itertools.permutations(inds, 2):
distances[i, j] = spot_diameter / 2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here for example, spot_diameter is not a parameter to the function, not part of any class that hold it and not inferred otherwise. how can this function run?

if hasattr(out_adata.X, "A1")
else out_adata.X.sum(axis=1) > 0
)
out_adata._inplace_subset_obs(nonzero_mask)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can break without warning between AnnData versions. Use out_adata = out_adata[nonzero_mask].copy() instead.

database_varm_key: str | None = None,
model: str | None = None,
genes: list | None = None,
use_metabolic_genes: bool | None = False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why there always bool | None ? The type should be bool, not bool | None, when the default is False.

@@ -0,0 +1,522 @@
import time
import os
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused in file. beware of unused imports (this is soemthing pre-commits should fix for. you)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage critically sparse
Only 7 tests cover
compute_knn_graph variants and basic autocorrelation. None of these are tested:

  • compute_cell_communication / compute_ct_cell_communication
  • create_modules, calculate_module_scores, calculate_super_module_scores
  • extract_interaction_db / database loading
  • datasets.load_* functions
  • preprocessing.setup_deconv_adata

if they are not needed remove those functions, otherwise you will need to add tets for them

"RESOLVI",
"SCVIVA",
"CYTOVI",
"harreman",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to at least capital H at the beginning of Harreman

notebooks/spatial/cell2location_lymph_node_spatial_tutorial
```

```{customcard}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw 4 tutorials in harreman docs, is that the only one you consider to add?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

on-merge: backport to 1.4.x on-merge: backport to 1.4.x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants