Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,24 @@ and this project adheres to [Semantic Versioning][].

## [Unreleased]

## [v0.1.3]

### Added
- Added a tutorial on spatial contextualization and niche identification {pr}`23`.
- Implemented a self-mapping mode with only a query dataset {pr}`21`.
- Allow importing a pre-computed dataset of transfered expression values {pr}`21`.
- Allow importing pre-computed neighborhood matrices {pr}`21`.
- Add a tutorial on spatial contextualization and niche identification {pr}`21`.
- Add an equal-weight kernel {pr}`22`.

## [v0.1.2]

### Added
- Included tests for the `check` module, and more tests for the main classes {pr}`15`.
- Implemented the computation of presence scores, following HNOCA-tools {pr}`16`.
- Add a `groupby` parameter to expression transfer evaluation {pr}`16`.
- Add a `test_var_key` parameter to expression transfer evaluation {pr}`19`.
- Add a tutorial on spatial mapping {pr}`19`.
- Added a `groupby` parameter to expression transfer evaluation {pr}`16`.
- Added a `test_var_key` parameter to expression transfer evaluation {pr}`19`.
- Added a tutorial on spatial mapping {pr}`19`.

## [v0.1.1]

Expand Down
22 changes: 18 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ k-NN-based mapping of cells across representations to tranfer labels, embeddings

Inspired by scanpy's [ingest][] and the [HNOCA-tools][] packages. Check out the [docs][] to learn more, in particular our [tutorials][].

## Key use cases

- Transfer cell type labels and expression values from dissociated to spatial datasets.
- Transfer embeddings between arbitrary query and reference datasets.
- Compute presence scores for query datasets in large reference atlasses.
- Identify niches in spatial datasets by contextualizing latent spaces in spatial coordinates.
- Evaluate the results of transferring labels, embeddings and feature spaces using a variety of metrics.

The core idea of `CellMapper` is to separate the method (k-NN graph with some kernel applied to get a mapping matrix) from the application (mapping across arbitrary representations), to be flexible and fast. The tool currently supports [pynndescent][], [sklearn][], [faiss][] and [rapids][] for neighborhood search, implements a variety of graph kernels, and is closely integrated with `AnnData` objects.

## Installation

You need to have Python 3.10 or newer installed on your system.
Expand All @@ -37,20 +47,20 @@ There are two alternative options to install ``cellmapper``:

## Getting started

This package assumes that you have ``ref`` and ``query`` AnnData objects, with a joint embedding computed and stored in ``.obsm``. We explicilty do not compute this joint embedding, but there are plenty of method you can use to get such joint embeddings, e.g. [GimVI][] or [ENVI][] for spatial mapping, [GLUE][], [MIDAS][] and [MOFA+][] for modality translation, and [scVI][], [scANVI][] and [scArches][] for query-to-reference mapping - this is just a small selection!
This package assumes that you have ``query`` and ``reference`` AnnData objects, with a joint embedding computed and stored in ``.obsm``. We explicilty do not compute this joint embedding, but there are plenty of method you can use to get such joint embeddings, e.g. [GimVI][] or [ENVI][] for spatial mapping, [GLUE][], [MIDAS][] and [MOFA+][] for modality translation, and [scVI][], [scANVI][] and [scArches][] for query-to-reference mapping - this is just a small selection!

With a joint embedding in ``.obsm["X_joint"]`` at hand, the simplest way to use ``CellMapper`` is as follows:
```Python
from cellmapper import CellMapper

cmap = CellMapper(ref, query).fit(
cmap = CellMapper(query, reference).fit(
use_rep="X_joint", obs_keys="celltype", obsm_keys="X_umap", layer_key="X"
)
```

This will transfer data from the reference to the query dataset, including celltype labels stored in ``ref.obs``, a UMAP embedding stored in ``ref.obsm``, and expression values stored in ``ref.X``.
This will transfer data from the reference to the query dataset, including celltype labels stored in ``reference.obs``, a UMAP embedding stored in ``reference.obsm``, and expression values stored in ``reference.X``.

There are many ways to customize this, e.g. use different ways to compute k-NN graphs and to turn them into mapping matrices, and we implement a few methods to evaluate whether your k-NN transfer was sucessful. Check out the [docs][] to learn more.
There are many ways to customize this, e.g. use different ways to compute k-NN graphs and to turn them into mapping matrices, and we implement a few methods to evaluate whether your k-NN transfer was sucessful. The tool also implements a `self-mapping` mode (only a query object, no reference), which is useful for spatial contextualization. Check out the [docs][] to learn more.

## Release notes

Expand All @@ -74,7 +84,11 @@ Please cite this GitHub repo if you find CellMapper useful for your research.
[coverage]: https://codecov.io/gh/quadbio/cellmapper
[pre-commit]: https://results.pre-commit.ci/latest/github/quadbio/cellmapper/main
[pypi]: https://pypi.org/project/cellmapper/

[faiss]: https://github.com/facebookresearch/faiss
[pynndescent]: https://github.com/lmcinnes/pynndescent
[sklearn]: https://scikit-learn.org/stable/modules/neighbors.html
[rapids]: https://docs.rapids.ai/api/cuml/stable/api/#nearest-neighbors

[ingest]: https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.ingest.html
[HNOCA-tools]: https://devsystemslab.github.io/HNOCA-tools/
Expand Down
251 changes: 137 additions & 114 deletions docs/notebooks/tutorials/spatial_mapping.ipynb

Large diffs are not rendered by default.

1,212 changes: 1,212 additions & 0 deletions docs/notebooks/tutorials/spatial_smoothing.ipynb

Large diffs are not rendered by default.

104 changes: 104 additions & 0 deletions docs/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,107 @@ @article{pijuan2019single
publisher={Nature Publishing Group UK London},
url={https://www.nature.com/articles/s41586-019-0933-9},
}

@article{varrone2024cellcharter,
title={CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity},
author={Varrone, Marco and Tavernari, Daniele and Santamaria-Mart{\'\i}nez, Albert and Walsh, Logan A and Ciriello, Giovanni},
journal={Nature genetics},
volume={56},
number={1},
pages={74--84},
year={2024},
publisher={Nature Publishing Group US New York},
url={https://www.nature.com/articles/s41588-023-01588-4},
}

@article{kim2022unsupervised,
title={Unsupervised discovery of tissue architecture in multiplexed imaging},
author={Kim, Junbum and Rustam, Samir and Mosquera, Juan Miguel and Randell, Scott H and Shaykhiev, Renat and Rendeiro, Andr{\'e} F and Elemento, Olivier},
journal={Nature methods},
volume={19},
number={12},
pages={1653--1661},
year={2022},
publisher={Nature Publishing Group US New York},
url={https://www.nature.com/articles/s41592-022-01657-2},
}

@article{blampey2024sopa,
title={Sopa: a technology-invariant pipeline for analyses of image-based spatial omics},
author={Blampey, Quentin and Mulder, Kevin and Gardet, Margaux and Christodoulidis, Stergios and Dutertre, Charles-Antoine and Andr{\'e}, Fabrice and Ginhoux, Florent and Courn{\`e}de, Paul-Henry},
journal={Nature Communications},
volume={15},
number={1},
pages={4981},
year={2024},
publisher={Nature Publishing Group UK London},
url={https://www.nature.com/articles/s41467-024-48981-z},
}

@article{birk2025quantitative,
title={Quantitative characterization of cell niches in spatially resolved omics data},
author={Birk, Sebastian and Bonafonte-Pard{\`a}s, Irene and Feriz, Adib Miraki and Boxall, Adam and Agirre, Eneritz and Memi, Fani and Maguza, Anna and Yadav, Anamika and Armingol, Erick and Fan, Rong and others},
journal={Nature Genetics},
pages={1--13},
year={2025},
publisher={Nature Publishing Group US New York},
url={https://www.nature.com/articles/s41588-025-02120-6},
}

@article{xu2024unsupervised,
title={Unsupervised spatially embedded deep representation of spatial transcriptomics},
author={Xu, Hang and Fu, Huazhu and Long, Yahui and Ang, Kok Siong and Sethi, Raman and Chong, Kelvin and Li, Mengwei and Uddamvathanak, Rom and Lee, Hong Kai and Ling, Jingjing and others},
journal={Genome Medicine},
volume={16},
number={1},
pages={12},
year={2024},
publisher={Springer},
url={https://link.springer.com/article/10.1186/s13073-024-01283-x},
}

@article{zhao2021spatial,
title={Spatial transcriptomics at subspot resolution with BayesSpace},
author={Zhao, Edward and Stone, Matthew R and Ren, Xing and Guenthoer, Jamie and Smythe, Kimberly S and Pulliam, Thomas and Williams, Stephen R and Uytingco, Cedric R and Taylor, Sarah EB and Nghiem, Paul and others},
journal={Nature biotechnology},
volume={39},
number={11},
pages={1375--1384},
year={2021},
publisher={Nature Publishing Group US New York},
url={https://www.nature.com/articles/s41587-021-00935-2},
}

@inproceedings{li2024stargate,
title={STARGATE: Spatial Transcriptomic Analysis with Recurrent and Graph Attention Techniques using Ensemble Learning},
author={Li, Ning and Badai, Jiayidaer and Chen, Dengjie and Xiao, Ming and Zhang, Le},
booktitle={2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
pages={5630--5637},
year={2024},
organization={IEEE},
url={https://ieeexplore.ieee.org/abstract/document/10822280},
}

@article{palla2022squidpy,
title={Squidpy: a scalable framework for spatial omics analysis},
author={Palla, Giovanni and Spitzer, Hannah and Klein, Michal and Fischer, David and Schaar, Anna Christina and Kuemmerle, Louis Benedikt and Rybakov, Sergei and Ibarra, Ignacio L and Holmberg, Olle and Virshup, Isaac and others},
journal={Nature methods},
volume={19},
number={2},
pages={171--178},
year={2022},
publisher={Nature Publishing Group US New York},
url={https://www.nature.com/articles/s41592-021-01358-2},
}

@article{lopez2018deep,
title={Deep generative modeling for single-cell transcriptomics},
author={Lopez, Romain and Regier, Jeffrey and Cole, Michael B and Jordan, Michael I and Yosef, Nir},
journal={Nature methods},
volume={15},
number={12},
pages={1053--1058},
year={2018},
publisher={Nature Publishing Group US New York},
url={https://www.nature.com/articles/s41592-018-0229-2},
}
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,13 @@ optional-dependencies.test = [
"squidpy",
]
optional-dependencies.tutorials = [
"cellmapper",
"harmony-pytorch",
"netgraph",
"python-louvain",
"scvi-tools",
"seaborn",
"sopa",
"squidpy",
]

Expand Down
66 changes: 63 additions & 3 deletions src/cellmapper/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,19 +69,72 @@ def zscore(x):
class CellMapperEvaluationMixin:
"""Mixin class for evaluation-related methods for CellMapper."""

def register_external_predictions(
self, label_key: str, prediction_postfix: str = "pred", confidence_postfix: str = "conf"
) -> None:
"""
Register externally computed predictions for evaluation.

Parameters
----------
label_key
Base key in .obs for the label (e.g., 'cell_type').
prediction_postfix
Postfix for prediction column in .obs (e.g., 'pred').
The full column name should be f"{label_key}_{prediction_postfix}".
confidence_postfix
Postfix for confidence column in .obs (e.g., 'conf').
The full column name should be f"{label_key}_{confidence_postfix}".

Returns
-------
None

Notes
-----
Updates the following attributes:

- ``prediction_postfix``: Postfix for prediction column.
- ``confidence_postfix``: Postfix for confidence column.
"""
# Verify that the expected columns exist
pred_col = f"{label_key}_{prediction_postfix}"
conf_col = f"{label_key}_{confidence_postfix}"

if pred_col not in self.query.obs.columns:
raise ValueError(f"Prediction column '{pred_col}' not found in query.obs")
if conf_col not in self.query.obs.columns:
raise ValueError(f"Confidence column '{conf_col}' not found in query.obs")

# Register the postfixes
self.prediction_postfix = prediction_postfix
self.confidence_postfix = confidence_postfix

logger.info(
"External predictions registered with prediction_postfix='%s' and confidence_postfix='%s'",
prediction_postfix,
confidence_postfix,
)

def evaluate_label_transfer(
self,
label_key: str,
prediction_postfix: str | None = None,
confidence_postfix: str | None = None,
confidence_cutoff: float = 0.0,
zero_division: int | Literal["warn"] = 0,
) -> None:
"""
Evaluate label transfer using a k-NN classifier.
Evaluate label transfer using a k-NN classifier or externally computed predictions.

Parameters
----------
label_key
Key in .obs storing ground-truth cell type annotations.
prediction_postfix
Postfix for prediction column in .obs. If None, uses self.prediction_postfix.
confidence_postfix
Postfix for confidence column in .obs. If None, uses self.confidence_postfix.
confidence_cutoff
Minimum confidence score required to include a cell in the evaluation.
zero_division
Expand All @@ -97,8 +150,15 @@ def evaluate_label_transfer(

- ``label_transfer_metrics``: Dictionary containing accuracy, precision, recall, F1 scores, and excluded fraction.
"""
if self.prediction_postfix is None or self.confidence_postfix is None:
raise ValueError("Label transfer has not been performed. Call transfer_labels() first.")
# Use provided postfixes if given, otherwise fall back to instance attributes
pred_postfix = prediction_postfix or self.prediction_postfix
conf_postfix = confidence_postfix or self.confidence_postfix

if pred_postfix is None or conf_postfix is None:
raise ValueError(
"Label transfer has not been performed. Either call transfer_labels() first "
"or provide prediction_postfix and confidence_postfix parameters."
)

# Extract ground-truth and predicted labels
y_true = self.query.obs[label_key].dropna()
Expand Down
Loading