Skip to content

perf: add ability to write downcasted indices#2159

Merged
ilan-gold merged 16 commits into
mainfrom
ig/downcast_indices
Jan 29, 2026
Merged

perf: add ability to write downcasted indices#2159
ilan-gold merged 16 commits into
mainfrom
ig/downcast_indices

Conversation

@ilan-gold
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold commented Oct 20, 2025

TODO:

Checks

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.65%. Comparing base (5212db8) to head (1361d36).
⚠️ Report is 100 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2159      +/-   ##
==========================================
- Coverage   84.77%   84.65%   -0.12%     
==========================================
  Files          46       46              
  Lines        7132     7142      +10     
==========================================
  Hits         6046     6046              
- Misses       1086     1096      +10     
Files with missing lines Coverage Δ
src/anndata/_io/specs/methods.py 90.78% <100.00%> (+0.14%) ⬆️
src/anndata/_settings.py 91.86% <100.00%> (+0.04%) ⬆️

... and 2 files with indirect coverage changes

Copy link
Copy Markdown
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

When reading this in, will the dtype be preserved? And if so, how do certain APIs deal with tiny int dtypes? Like anndata’s concatenation and scanpy’s algorithms?

Comment thread tests/test_io_elementwise.py Outdated
@ilan-gold
Copy link
Copy Markdown
Contributor Author

When reading this in, will the dtype be preserved? And if so, how do certain APIs deal with tiny int dtypes? Like anndata’s concatenation and scanpy’s algorithms?

scipy will convert for us to match indptr from what I can tell from the issues and the behavior of the tests passing (i.e., the data is read back in and matches the input). So this isn't really a scanpy problem. In the future when supporting different sparse matrices, (like finch tensor), I think we will hopefully be able to preserve types. I don't see anything in https://graphblas.org/binsparse-specification/ that would indicate that the data types have to match

Co-authored-by: Philipp A. <flying-sheep@web.de>
@scverse-benchmark
Copy link
Copy Markdown

scverse-benchmark Bot commented Oct 23, 2025

Benchmark changes

Change Before [5212db8] After [1361d36] Ratio Benchmark (Parameter)
- 21.1±3ms 15.6±0.4ms 0.74 backed_hdf5.BackedHDF5Indexing.time_slice_obs_to_memory('sparse')
+ 1.99±0.01ms 2.36±0.01ms 1.19 dataset2d.Dataset2D.time_getitem_slice('h5ad', (-1,), 'cat')
+ 301±3μs 351±6μs 1.17 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata('alternating', True)

Comparison: https://github.com/scverse/anndata/compare/5212db80485432719021445084d93407c0ce11b2..1361d36a4dd2e4bc20dfe3f89f03369f131be637
Last changed: Mon, 3 Nov 2025 13:57:48 +0000

More details: https://github.com/scverse/anndata/pull/2159/checks?check_run_id=54361625499

@ilan-gold ilan-gold modified the milestones: 0.12.4, 0.12.5 Oct 27, 2025
@ilan-gold ilan-gold modified the milestones: 0.12.5, 0.12.7 Nov 6, 2025
@flying-sheep flying-sheep modified the milestones: 0.12.7, 0.12.8 Dec 16, 2025
@ilan-gold ilan-gold modified the milestones: 0.12.8, 0.12.9 Jan 27, 2026
@ilan-gold ilan-gold merged commit 7ed86f7 into main Jan 29, 2026
28 checks passed
@ilan-gold ilan-gold deleted the ig/downcast_indices branch January 29, 2026 09:27
meeseeksmachine pushed a commit to meeseeksmachine/anndata that referenced this pull request Jan 29, 2026
flying-sheep pushed a commit that referenced this pull request Jan 29, 2026
…sted indices) (#2318)

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Downcast indices for sparse matrices if possible on-disk

2 participants