|
2 | 2 |
|
3 | 3 | *illico* is a python library performing blazing fast asymptotic wilcoxon rank-sum tests (same as `scanpy.tl.rank_genes_groups(…, method="wilcoxon")`), useful for single-cell RNASeq data analyses and processing. `illico`'s features are: |
4 | 4 |
|
5 | | -1. 🚀 Blazing fast: On K562 (essential) dataset (~300k cells, 8k genes, 2k perturbations), `illico` computes DE genes (with `reference="non-targeting"`) in a mere 20 seconds. That's more than 100 times faster than both `pdex` or `scanpy` with the same compute ressources (8 CPUs). |
| 5 | +1. 🚀 Blazing fast: On K562 (essential) dataset (~300k cells, 8k genes, 2k perturbations), `illico` computes DE genes (with `reference="non-targeting"`) in a mere 15 seconds. That's more than 100 times faster than both `pdex` or `scanpy` with the same compute ressources (8 CPUs). |
6 | 6 | 2. 💠 No compromise: on synthetic data, `illico`'s p-values matched `scipy.stats.mannwhitneyu` up to a relative difference of 1.e-12, and an absolute tolerance of 0. |
7 | 7 | 3. ⚡ Thread-first: `illico` eventually parallelizes the processing (if specified by the user) over **threads**, never processes. This saves you from all the fixed cost of multiprocessing, such as spanning processes, duplicating data across processes, and communication costs. |
8 | 8 | 4. 🐞 Data format agnostic: whether your data is dense, sparse along rows, or sparse along columns, `illico` will deal with it while never converting the whole data to whichever format is more optimized. |
9 | 9 | 5. 🪶 Lightweight: `illico` will process the input data in batches, making any memory allocation needed along the way much smaller than if it processed the whole data at once. |
10 | | -6. 📈 Scalable: Because thread-first and batchable, `illico` scales reasonably with your compute budget. Tests showed that spanning 8 threads brings a 7-fold speedup over spanning 1 single thread. |
11 | | -7. 💾 Out-of-core: `illico` supports h5-based, on-disk-backed, dense and CSC datasets natively. |
| 10 | +6. 📈 Scalable: Because thread-first and batchable, `illico` scales reasonably with your compute budget. Tests showed that spanning 16 threads brings a 14-fold speedup over spanning 1 single thread. |
| 11 | +7. 💾 Out-of-core: `illico` supports h5-based, on-disk-backed, dense, CSC and CSR datasets natively. |
12 | 12 | 8. 🎆 All-purpose: `illico` performs both one-versus-reference (useful for perturbation analyses) and one-versus-rest (useful for clustering analyses) wilcoxon rank-sum tests, both equally optimized and fast. |
13 | 13 |
|
14 | 14 | Approximate speed benchmarks ran on k562-essential can be found in the Benchmarks section. All the code used to generate those numbers can be found in `tests/test_asymptotic_wilcoxon.py::test_speed_benchmark`. |
|
0 commit comments