You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-7Lines changed: 17 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,7 +86,7 @@ Please open an issue, but before that: make sure that you are running **asymptot
86
86
87
87
### What if my adata does not fit in memory ?
88
88
Optimizing this use case is highly non-trivial as efficiently chunking CSR or CSC matrices is much more complex than running `adata[:, idxs]`. Ran on a CSR matrix, this command will load (temporarily) the entirety of the indices in RAM, resulting in a memory footprint almost equivalent to loading everything at once, on top of being extremely slow.
89
-
1. If your adata holds the expression matrix in a dense array, `illico`will work on it transparently because batch-based by design.
89
+
1. If your adata holds the expression matrix in a dense array, `illico`shall work on it with very little extra work because batch-based by design.
90
90
2. If your adata holds the expression matrix in a sparse (CSC or CSR) array, you have no other choice than manually chunking your array before running `illico` on batches. But, again, in this case I would advice to fallback to other solutions like `rapids-singlecell`.
91
91
92
92
## How it works
@@ -107,17 +107,27 @@ In order for benchmarks to run in a reasonable amount of time, the timings repor
107
107
:bulb: Keep in mind that `pdex` does not implement *OVR* test.
<figcaption>Runtime comparison for scanpy, pdex and illico on four cell lines.</figcaption></center>
112
112
</p>
113
113
114
114
### Scalability
115
115
TODO: this could clearly be improved with a smarter batching strategy
116
116
`illico` scales reasonably well with your compute budget. Find below the processing time of the K562-essential dataset for both OVO and OVR tests, while increasing the number of threads used. Similarly as before, a benchmark is defined by:
117
117
1. The data format (CSR, or dense) used to contain the expression matrix.
118
118
2. The test performed: OVO (`reference="non-targeting"`) or OVR (`reference=None`).
119
-
119
+
The example below shows spanning 8 threads instead of 1 brings a 7-folds speedup for the cell line k562.
The name *illico* is a wordplay inspired by the R package `presto` (now the Wilcoxon rank-sum test backend in Seurat). Aside from this naming reference, there is no affiliation or intended equivalence between the two. `illico` was developed independently, and although the statistical methodology may be similar, it was not designed to reproduce `presto`’s results.
0 commit comments