Skip to content

Commit d763d2b

Browse files
committed
Improved readme
1 parent 7f59d99 commit d763d2b

1 file changed

Lines changed: 17 additions & 7 deletions

File tree

README.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ Please open an issue, but before that: make sure that you are running **asymptot
8686

8787
### What if my adata does not fit in memory ?
8888
Optimizing this use case is highly non-trivial as efficiently chunking CSR or CSC matrices is much more complex than running `adata[:, idxs]`. Ran on a CSR matrix, this command will load (temporarily) the entirety of the indices in RAM, resulting in a memory footprint almost equivalent to loading everything at once, on top of being extremely slow.
89-
1. If your adata holds the expression matrix in a dense array, `illico` will work on it transparently because batch-based by design.
89+
1. If your adata holds the expression matrix in a dense array, `illico` shall work on it with very little extra work because batch-based by design.
9090
2. If your adata holds the expression matrix in a sparse (CSC or CSR) array, you have no other choice than manually chunking your array before running `illico` on batches. But, again, in this case I would advice to fallback to other solutions like `rapids-singlecell`.
9191

9292
## How it works
@@ -107,17 +107,27 @@ In order for benchmarks to run in a reasonable amount of time, the timings repor
107107
:bulb: Keep in mind that `pdex` does not implement *OVR* test.
108108

109109
<p float="center">
110-
<center><img src="https://github.com/remydubois/illico/blob/main/assets/method-runtimes-comparison.png?raw=true" width="700" />
111-
<figcaption>Runtime comparison for scanpy, pdex and illico</figcaption></center>
110+
<center><img src="https://github.com/remydubois/illico/blob/main/assets/method-runtimes-comparison.png?raw=true" width="100%" />
111+
<figcaption>Runtime comparison for scanpy, pdex and illico on four cell lines.</figcaption></center>
112112
</p>
113113

114114
### Scalability
115115
TODO: this could clearly be improved with a smarter batching strategy
116116
`illico` scales reasonably well with your compute budget. Find below the processing time of the K562-essential dataset for both OVO and OVR tests, while increasing the number of threads used. Similarly as before, a benchmark is defined by:
117117
1. The data format (CSR, or dense) used to contain the expression matrix.
118118
2. The test performed: OVO (`reference="non-targeting"`) or OVR (`reference=None`).
119-
119+
The example below shows spanning 8 threads instead of 1 brings a 7-folds speedup for the cell line k562.
120120
```bash
121+
---------------------- benchmark 'k562-dense-ovo': 4 tests -----------------------
122+
Name (time in s) Mean
123+
----------------------------------------------------------------------------------
124+
test_speed_benchmark[k562-dense-100%-illico-ovo-nthreads=8] 29.6962 (1.0)
125+
test_speed_benchmark[k562-dense-100%-illico-ovo-nthreads=4] 53.4369 (1.80)
126+
test_speed_benchmark[k562-dense-100%-illico-ovo-nthreads=2] 100.3919 (3.38)
127+
test_speed_benchmark[k562-dense-100%-illico-ovo-nthreads=1] 208.2443 (7.01)
128+
----------------------------------------------------------------------------------
129+
```
130+
<!-- ```bash
121131
---------------------- benchmark 'k562-csr-ovo': 4 tests -----------------------
122132
Name (time in s) Mean
123133
--------------------------------------------------------------------------------
@@ -153,8 +163,8 @@ test_speed_benchmark[k562-dense-100%-illico-ovr-nthreads=4] 33.6427 (1.74)
153163
test_speed_benchmark[k562-dense-100%-illico-ovr-nthreads=2] 63.1888 (3.27)
154164
test_speed_benchmark[k562-dense-100%-illico-ovr-nthreads=1] 127.4927 (6.60)
155165
----------------------------------------------------------------------------------
156-
```
157-
### Memory
166+
``` -->
167+
<!-- ### Memory
158168
TODO: Add memit for all solutions, remind that memory footprint grows linearly with number of threads for illico.
159169
```
160170
============================================================================== MEMRAY REPORT ===============================================================================
@@ -194,7 +204,7 @@ Allocation results for tests/test_asymptotic_wilcoxon.py::test_memory_benchmark[
194204
- tolist:/Users/remydubois/Documents/perso/repos/illico/.venv/lib/python3.13/site-packages/pandas/core/arrays/base.py:2078 -> 1.7MiB
195205
- _wrapit:/Users/remydubois/Documents/perso/repos/illico/.venv/lib/python3.13/site-packages/numpy/_core/fromnumeric.py:46 -> 1.7MiB
196206
- encode_and_count_groups:/Users/remydubois/Documents/perso/repos/illico/illico/utils/groups.py:25 -> 1.7MiB
197-
```
207+
``` -->
198208
## Why illico
199209
The name *illico* is a wordplay inspired by the R package `presto` (now the Wilcoxon rank-sum test backend in Seurat). Aside from this naming reference, there is no affiliation or intended equivalence between the two. `illico` was developed independently, and although the statistical methodology may be similar, it was not designed to reproduce `presto`’s results.
200210

0 commit comments

Comments
 (0)