Skip to content

Commit 42123b3

Browse files
ArshammikIntron7
andauthored
Fix float64 precision loss in sparse Pearson residual kernels (#658)
* Fix float64 precision loss in sparse Pearson residual kernels The CSR and CSC Pearson residual kernels in _cuda/pr/kernels_pr.cuh divided by `sqrtf`, the single-precision square-root intrinsic. Because both kernels are templated on the element type `T`, a `T=double` instantiation silently narrowed the variance term `mu + mu * mu * inv_theta` to float32, evaluated the square root at single precision, and promoted the result back to double. The float64 path of `pp.normalize_pearson_residuals` (and `pp.highly_variable_genes` with `flavor='pearson_residuals'`) was therefore capped at ~7 significant digits regardless of the requested dtype. The dense kernel `dense_norm_res_kernel` already used the overloaded `sqrt` and was unaffected. Replace `sqrtf` with the overloaded `sqrt` on both sparse paths. `sqrt` dispatches to the single-precision root for `T=float` and the double-precision root for `T=double`, so the float32 path is byte-identical to before and only the float64 path changes. Hardware verification (NVIDIA H100 80GB HBM3, CUDA 12.6, sm_90): A standalone harness compiled the real `sparse_norm_res_csr_kernel` verbatim and ran it on a 4000x4000 synthetic CSR count matrix against a host float64 reference. T=double, before fix: max relative error 8.83e-08 (~7.1 digits) T=double, after fix: max relative error 3.97e-16 (~15.4 digits) T=float, before/after: bit-identical (max abs diff 0.0) The float64 path is now ~8 orders of magnitude more accurate; the float32 path is provably unchanged. Add `test_normalize_pearson_residuals_float64_precision` to tests/test_normalization.py. It pins the float64 CSR/CSC output to an analytic float64 reference at rtol/atol 1e-9 -- tight enough to fail on a single-precision result and pass on a genuine float64 one -- across theta in {100, inf}. * add PR number * switch to rsqrt --------- Co-authored-by: Intron7 <severin.dicks@icloud.com> Co-authored-by: Severin Dicks <37635888+Intron7@users.noreply.github.com>
1 parent 61eda66 commit 42123b3

3 files changed

Lines changed: 43 additions & 2 deletions

File tree

docs/release-notes/0.15.1.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
```{rubric} Bug fixes
99
```
1010
* Fixes `tl.rank_genes_groups` returning NaN/zero `logfoldchanges`/`pvals` with `groups=[subset]` and `reference='rest'` {pr}`651` {smaller}`S Dicks`
11+
* Fixes float64 precision loss in `pp.normalize_pearson_residuals` on CSR/CSC input {pr}`658` {smaller}`A Mikaeili & S Dicks`
1112

1213
```{rubric} Misc
1314
```

src/rapids_singlecell/_cuda/pr/kernels_pr.cuh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ __global__ void sparse_norm_res_csc_kernel(
2424
++sparse_idx;
2525
}
2626
residuals[res_index] -= mu;
27-
residuals[res_index] /= sqrtf(mu + mu * mu * inv_theta);
27+
residuals[res_index] *= rsqrt(mu + mu * mu * inv_theta);
2828
// clamp to [-clip, clip]
2929
if (residuals[res_index] < -clip) residuals[res_index] = -clip;
3030
if (residuals[res_index] > clip) residuals[res_index] = clip;
@@ -53,7 +53,7 @@ __global__ void sparse_norm_res_csr_kernel(
5353
++sparse_idx;
5454
}
5555
residuals[res_index] -= mu;
56-
residuals[res_index] /= sqrtf(mu + mu * mu * inv_theta);
56+
residuals[res_index] *= rsqrt(mu + mu * mu * inv_theta);
5757

5858
if (residuals[res_index] < -clip) residuals[res_index] = -clip;
5959
if (residuals[res_index] > clip) residuals[res_index] = clip;

tests/test_normalization.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,46 @@ def test_normalize_pearson_residuals_values(sparsity_func, dtype, theta, clip):
9090
assert np.min(output_X) >= -clip
9191

9292

93+
@pytest.mark.parametrize(
94+
"sparsity_func", [csr_matrix, csc_matrix], ids=lambda x: x.__name__
95+
)
96+
@pytest.mark.parametrize("theta", [100.0, np.inf])
97+
def test_normalize_pearson_residuals_float64_precision(sparsity_func, theta):
98+
"""Regression test: float64 precision of the sparse Pearson-residual kernels.
99+
100+
``sparse_norm_res_csr_kernel`` / ``sparse_norm_res_csc_kernel`` (in
101+
``_cuda/pr/kernels_pr.cuh``) previously divided by the single-precision
102+
intrinsic ``sqrtf``. Because the kernels are templated on the element
103+
type, a ``float64`` instantiation silently narrowed the variance term
104+
to ``float32``, capping accuracy at ~7 significant digits regardless of
105+
the requested dtype. The ``rtol``/``atol`` of 1e-9 below is tight enough
106+
to fail on a single-precision result and pass on a genuine float64 one.
107+
"""
108+
rng = np.random.default_rng(0)
109+
counts = rng.poisson(0.3, size=(300, 200)).astype(np.float64)
110+
# ensure every gene and cell has a nonzero total so mu > 0 everywhere
111+
counts[0, :] += 1
112+
counts[:, 0] += 1
113+
X = cp.asarray(counts)
114+
115+
# analytic float64 reference residuals (no clipping)
116+
ns = cp.sum(X, axis=1)
117+
ps = cp.sum(X, axis=0) / cp.sum(X)
118+
mu = cp.outer(ns, ps)
119+
if np.isinf(theta):
120+
reference = (X - mu) / cp.sqrt(mu)
121+
else:
122+
reference = (X - mu) / cp.sqrt(mu + mu**2 / theta)
123+
124+
cudata = AnnData(X=sparsity_func(X, dtype=np.float64))
125+
output = rsc.pp.normalize_pearson_residuals(
126+
cudata, theta=theta, clip=np.inf, inplace=False
127+
)
128+
129+
# the buggy `sqrtf` path is only ~1e-7 accurate; 1e-9 cleanly separates it
130+
cp.testing.assert_allclose(output, reference, rtol=1e-9, atol=1e-9)
131+
132+
93133
@pytest.mark.parametrize("dtype", [np.float32, np.float64])
94134
@pytest.mark.parametrize("sparse", [True, False])
95135
@pytest.mark.parametrize("base", [None, 2, 10])

0 commit comments

Comments
 (0)