|
| 1 | +### 0.15.0 {small}`2026-04-30` |
| 2 | + |
| 3 | +```{rubric} Features |
| 4 | +``` |
| 5 | +* Replaces CuPy RawKernel infrastructure with precompiled **nanobind/CUDA C++** extensions {pr}`455` {smaller}`S Dicks` |
| 6 | + |
| 7 | + All GPU kernels are now compiled at build time via scikit-build-core instead of JIT-compiled on first call, eliminating startup latency and CuPy kernel cache issues across CUDA/driver upgrades. |
| 8 | + Prebuilt wheels are available on PyPI as `rapids-singlecell-cu12` (CUDA 12) and `rapids-singlecell-cu13` (CUDA 13) for both x86_64 and aarch64 — no CUDA toolkit or nvcc required for installation. |
| 9 | + |
| 10 | + nanobind's typed array bindings enforce dtype (e.g., float32 vs float64) and memory layout (C-contiguous vs F-contiguous) at the Python/C++ boundary, catching mismatches with clear `TypeError` messages before they reach the GPU instead of producing silent corruption or cryptic CUDA errors. |
| 11 | + |
| 12 | + Kernels are now proper C++ with headers, templates, and multi-file organization, which will enable more optimized and composable functions that are entirely C++ in future releases. |
| 13 | + |
| 14 | +* Rewrites Harmony clustering and correction loops in C++, removing the ``use_gemm`` parameter and one-hot ``Phi`` matrix in favor of categorical indices. ``correction_method`` now defaults to ``None`` and auto-selects ``batched`` or ``fast`` based on workspace size {pr}`578` {smaller}`S Dicks` |
| 15 | +* Improves numerical accuracy and adds parameters to `tl.rank_genes_groups` Wilcoxon methods: uses ``erfc`` for p-values to avoid underflow, adds ``tie_correct`` and ``use_continuity`` to ``wilcoxon_binned``, and refactors ``Aggregate`` with a unified ``count_mean_var()`` dispatcher and raw ``sq_sum`` output for GPU-resident stats computation {pr}`585` {smaller}`S Dicks` |
| 16 | +* Replace cuML KDE in ``tl.embedding_density`` with a custom CUDA kernel using covariance-aware Gaussian KDE matching ``scipy.stats.gaussian_kde``, removing the cuML dependency and the ``batchsize`` parameter {pr}`590` {smaller}`S Dicks` |
| 17 | +* Allow multiple control groups in ``onesided_distances`` for computing energy distances against several references in a single kernel launch {pr}`601` {smaller}`S Dicks` |
| 18 | +* Add ``contrast_distances`` to ``EDistanceMetric`` for computing energy distances directly from a contrasts DataFrame {pr}`603` {smaller}`S Dicks` |
| 19 | +* Add Dask support for ``highly_variable_genes`` with ``flavor='seurat_v3'`` and ``flavor='seurat_v3_paper'`` {pr}`616` {smaller}`S Dicks` |
| 20 | +* Add Harmony2 support with stabilized diversity penalty, dynamic per-cluster-per-batch ridge regularization, and automatic batch pruning {cite:p}`Patikas2026` {pr}`625` {smaller}`S Dicks` |
| 21 | + |
| 22 | +```{rubric} Performance |
| 23 | +``` |
| 24 | +* Improve L2 cache efficiency in ``edistance`` and ``co_occurrence`` kernels by always tiling the smaller group into shared memory, yielding up to 5x speedup for datasets with unequal group sizes {pr}`607` {smaller}`S Dicks` |
| 25 | + |
| 26 | +```{rubric} Bug fixes |
| 27 | +``` |
| 28 | +* Fix ``TypeError`` when using nanobind CUDA kernels with RMM managed memory (``managed_memory=True``). Nanobind bindings now accept both ``kDLCUDA`` and ``kDLCUDAManaged`` DLPack device types {pr}`592` {smaller}`S Dicks` |
| 29 | +* Fix multi-GPU ``cudaErrorLaunchFailure`` during cross-device result aggregation when using RMM without pool allocation for very large datasets {pr}`594` {smaller}`S Dicks` |
| 30 | +* Fix ForceAtlas2 random cell ordering by sorting positions by vertex in ``tl.draw_graph`` {pr}`621` {smaller}`L Faure` |
| 31 | + |
| 32 | +```{rubric} Removals |
| 33 | +``` |
| 34 | +* Remove `tl.mde` and the `pymde` dependency. The function is still available in `scvi-tools` {pr}`588` {smaller}`S Dicks` |
| 35 | + |
| 36 | +```{rubric} Misc |
| 37 | +``` |
| 38 | +* Refactor ``tl.rank_genes_groups`` internals to use categorical integer codes instead of boolean mask matrices {pr}`570` {smaller}`S Dicks` |
| 39 | +* Align RAPIDS 26.04 conda and CI environments with Python 3.14 {pr}`639` {smaller}`S Dicks` |
0 commit comments