Skip to content

Commit 8ec14dc

Browse files
authored
0.15.0 release prep v2 (scverse#646)
* fix release refs * add link to blogpost
1 parent b576024 commit 8ec14dc

9 files changed

Lines changed: 41 additions & 81 deletions

File tree

docs/basic.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ If you use rapids-singlecell, please cite:
6262
6363
## News
6464

65+
* 30.04.26 [**v0.15.0 released!**](https://scverse.org/blog/2026-rsc-goes-nanobind/) This release ships precompiled CUDA kernels via [nanobind](https://github.com/wjakob/nanobind) — no CUDA toolkit needed at install time. Prebuilt wheels for x86_64 and aarch64 support CUDA 12 and 13, covering Turing through Blackwell GPUs. Install with `pip install rapids-singlecell-cu13` (or `-cu12`). See the [installation guide](installation.md) for details.
6566
* 04.03.26 **rapids-singlecell is now on arXiv!** Check out our preprint: [GPU-accelerated single-cell analysis at scale with rapids-singlecell](https://doi.org/10.48550/arXiv.2603.02402)
66-
* 19.02.26 **v0.15.0 pre-release available!** This release ships precompiled CUDA kernels via [nanobind](https://github.com/wjakob/nanobind) — no CUDA toolkit needed at install time. Prebuilt wheels for x86_64 and aarch64 support CUDA 12 and 13, covering Turing through Blackwell GPUs. Install with `pip install --pre rapids-singlecell-cu13` (or `-cu12`) and help us test! See the [installation guide](installation.md) for details.
6767
* 01.07.25 *rapids-singlecell* is now an [scverse® core package](https://scverse.org/blog/2025-core-expansion/)
6868
* 12.06.25 *rapids-singlecell* was highlighted in an other NVIDIA [technical blog post](https://developer.nvidia.com/blog/driving-toward-billion-cell-analysis-and-biological-breakthroughs-with-rapids-singlecell/)
6969
* 07.08.23 *rapids-singlecell* is now part of scverse® ecosystem.

docs/installation.md

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -26,22 +26,8 @@ mamba env create -f conda/rsc_rapids_26.04_cuda12.yml
2626
RAPIDS currently doesn't support `channel_priority: strict`; use `channel_priority: flexible` instead
2727
```
2828

29-
```{warning}
30-
The conda environment files on the `main` branch reference the new `rapids-singlecell-cu12`/`-cu13` wheel names, which are currently only available as pre-release.
31-
Until 0.15.0 is released, use the environment files from the [v0.14.1 tag](https://github.com/scverse/rapids_singlecell/tree/v0.14.1/conda) instead, or add `--pre` to the pip line manually.
32-
```
33-
3429
## PyPI
3530

36-
```{note}
37-
**Pre-release testing:** Version 0.15.0 is currently in pre-release. We'd love for you to test it
38-
and report any issues! Install the latest release candidate with:
39-
40-
pip install --pre rapids-singlecell-cu13 # or rapids-singlecell-cu12
41-
42-
Please report any problems on [GitHub Issues](https://github.com/scverse/rapids_singlecell/issues).
43-
```
44-
4531
Starting with version 0.15.0, *rapids-singlecell* ships precompiled CUDA kernels via nanobind.
4632
Prebuilt wheels are available for **x86_64** and **aarch64** Linux for both CUDA 12 and CUDA 13.
4733

docs/release-notes/0.15.0.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
### 0.15.0 {small}`2026-04-30`
2+
3+
```{rubric} Features
4+
```
5+
* Replaces CuPy RawKernel infrastructure with precompiled **nanobind/CUDA C++** extensions {pr}`455` {smaller}`S Dicks`
6+
7+
All GPU kernels are now compiled at build time via scikit-build-core instead of JIT-compiled on first call, eliminating startup latency and CuPy kernel cache issues across CUDA/driver upgrades.
8+
Prebuilt wheels are available on PyPI as `rapids-singlecell-cu12` (CUDA 12) and `rapids-singlecell-cu13` (CUDA 13) for both x86_64 and aarch64 — no CUDA toolkit or nvcc required for installation.
9+
10+
nanobind's typed array bindings enforce dtype (e.g., float32 vs float64) and memory layout (C-contiguous vs F-contiguous) at the Python/C++ boundary, catching mismatches with clear `TypeError` messages before they reach the GPU instead of producing silent corruption or cryptic CUDA errors.
11+
12+
Kernels are now proper C++ with headers, templates, and multi-file organization, which will enable more optimized and composable functions that are entirely C++ in future releases.
13+
14+
* Rewrites Harmony clustering and correction loops in C++, removing the ``use_gemm`` parameter and one-hot ``Phi`` matrix in favor of categorical indices. ``correction_method`` now defaults to ``None`` and auto-selects ``batched`` or ``fast`` based on workspace size {pr}`578` {smaller}`S Dicks`
15+
* Improves numerical accuracy and adds parameters to `tl.rank_genes_groups` Wilcoxon methods: uses ``erfc`` for p-values to avoid underflow, adds ``tie_correct`` and ``use_continuity`` to ``wilcoxon_binned``, and refactors ``Aggregate`` with a unified ``count_mean_var()`` dispatcher and raw ``sq_sum`` output for GPU-resident stats computation {pr}`585` {smaller}`S Dicks`
16+
* Replace cuML KDE in ``tl.embedding_density`` with a custom CUDA kernel using covariance-aware Gaussian KDE matching ``scipy.stats.gaussian_kde``, removing the cuML dependency and the ``batchsize`` parameter {pr}`590` {smaller}`S Dicks`
17+
* Allow multiple control groups in ``onesided_distances`` for computing energy distances against several references in a single kernel launch {pr}`601` {smaller}`S Dicks`
18+
* Add ``contrast_distances`` to ``EDistanceMetric`` for computing energy distances directly from a contrasts DataFrame {pr}`603` {smaller}`S Dicks`
19+
* Add Dask support for ``highly_variable_genes`` with ``flavor='seurat_v3'`` and ``flavor='seurat_v3_paper'`` {pr}`616` {smaller}`S Dicks`
20+
* Add Harmony2 support with stabilized diversity penalty, dynamic per-cluster-per-batch ridge regularization, and automatic batch pruning {cite:p}`Patikas2026` {pr}`625` {smaller}`S Dicks`
21+
22+
```{rubric} Performance
23+
```
24+
* Improve L2 cache efficiency in ``edistance`` and ``co_occurrence`` kernels by always tiling the smaller group into shared memory, yielding up to 5x speedup for datasets with unequal group sizes {pr}`607` {smaller}`S Dicks`
25+
26+
```{rubric} Bug fixes
27+
```
28+
* Fix ``TypeError`` when using nanobind CUDA kernels with RMM managed memory (``managed_memory=True``). Nanobind bindings now accept both ``kDLCUDA`` and ``kDLCUDAManaged`` DLPack device types {pr}`592` {smaller}`S Dicks`
29+
* Fix multi-GPU ``cudaErrorLaunchFailure`` during cross-device result aggregation when using RMM without pool allocation for very large datasets {pr}`594` {smaller}`S Dicks`
30+
* Fix ForceAtlas2 random cell ordering by sorting positions by vertex in ``tl.draw_graph`` {pr}`621` {smaller}`L Faure`
31+
32+
```{rubric} Removals
33+
```
34+
* Remove `tl.mde` and the `pymde` dependency. The function is still available in `scvi-tools` {pr}`588` {smaller}`S Dicks`
35+
36+
```{rubric} Misc
37+
```
38+
* Refactor ``tl.rank_genes_groups`` internals to use categorical integer codes instead of boolean mask matrices {pr}`570` {smaller}`S Dicks`
39+
* Align RAPIDS 26.04 conda and CI environments with Python 3.14 {pr}`639` {smaller}`S Dicks`

docs/release-notes/0.15.0rc3.md

Lines changed: 0 additions & 14 deletions
This file was deleted.

docs/release-notes/0.15.0rc4.md

Lines changed: 0 additions & 14 deletions
This file was deleted.

docs/release-notes/0.15.0rc5.md

Lines changed: 0 additions & 13 deletions
This file was deleted.

docs/release-notes/0.15.0rc6.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

docs/release-notes/0.15.0rc7.md

Lines changed: 0 additions & 5 deletions
This file was deleted.

docs/release-notes/index.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,7 @@
44

55

66
## Version 0.15.0
7-
```{include} /release-notes/0.15.0rc7.md
8-
```
9-
```{include} /release-notes/0.15.0rc6.md
10-
```
11-
```{include} /release-notes/0.15.0rc5.md
12-
```
13-
```{include} /release-notes/0.15.0rc4.md
14-
```
15-
```{include} /release-notes/0.15.0rc3.md
7+
```{include} /release-notes/0.15.0.md
168
```
179

1810
## Version 0.14.0

0 commit comments

Comments
 (0)