|
243 | 243 | <h1>Overview<a class="headerlink" href="#overview" title="Link to this heading">¶</a></h1> |
244 | 244 | <p><em>illico</em> is a python library performing blazing fast asymptotic wilcoxon rank-sum tests (same as <code class="docutils literal notranslate"><span class="pre">scanpy.tl.rank_genes_groups(…,</span> <span class="pre">method="wilcoxon")</span></code>), useful for single-cell RNASeq data analyses and processing. <code class="docutils literal notranslate"><span class="pre">illico</span></code>’s features are:</p> |
245 | 245 | <ol class="arabic simple"> |
246 | | -<li><p>🚀 Blazing fast: On K562 (essential) dataset (~300k cells, 8k genes, 2k perturbations), <code class="docutils literal notranslate"><span class="pre">illico</span></code> computes DE genes (with <code class="docutils literal notranslate"><span class="pre">reference="non-targeting"</span></code>) in a mere 20 seconds. That’s more than 100 times faster than both <code class="docutils literal notranslate"><span class="pre">pdex</span></code> or <code class="docutils literal notranslate"><span class="pre">scanpy</span></code> with the same compute ressources (8 CPUs).</p></li> |
| 246 | +<li><p>🚀 Blazing fast: On K562 (essential) dataset (~300k cells, 8k genes, 2k perturbations), <code class="docutils literal notranslate"><span class="pre">illico</span></code> computes DE genes (with <code class="docutils literal notranslate"><span class="pre">reference="non-targeting"</span></code>) in a mere 15 seconds. That’s more than 100 times faster than both <code class="docutils literal notranslate"><span class="pre">pdex</span></code> or <code class="docutils literal notranslate"><span class="pre">scanpy</span></code> with the same compute ressources (8 CPUs).</p></li> |
247 | 247 | <li><p>💠 No compromise: on synthetic data, <code class="docutils literal notranslate"><span class="pre">illico</span></code>’s p-values matched <code class="docutils literal notranslate"><span class="pre">scipy.stats.mannwhitneyu</span></code> up to a relative difference of 1.e-12, and an absolute tolerance of 0.</p></li> |
248 | 248 | <li><p>⚡ Thread-first: <code class="docutils literal notranslate"><span class="pre">illico</span></code> eventually parallelizes the processing (if specified by the user) over <strong>threads</strong>, never processes. This saves you from all the fixed cost of multiprocessing, such as spanning processes, duplicating data across processes, and communication costs.</p></li> |
249 | 249 | <li><p>🐞 Data format agnostic: whether your data is dense, sparse along rows, or sparse along columns, <code class="docutils literal notranslate"><span class="pre">illico</span></code> will deal with it while never converting the whole data to whichever format is more optimized.</p></li> |
250 | 250 | <li><p>🪶 Lightweight: <code class="docutils literal notranslate"><span class="pre">illico</span></code> will process the input data in batches, making any memory allocation needed along the way much smaller than if it processed the whole data at once.</p></li> |
251 | | -<li><p>📈 Scalable: Because thread-first and batchable, <code class="docutils literal notranslate"><span class="pre">illico</span></code> scales reasonably with your compute budget. Tests showed that spanning 8 threads brings a 7-fold speedup over spanning 1 single thread.</p></li> |
252 | | -<li><p>💾 Out-of-core: <code class="docutils literal notranslate"><span class="pre">illico</span></code> supports h5-based, on-disk-backed, dense and CSC datasets natively.</p></li> |
| 251 | +<li><p>📈 Scalable: Because thread-first and batchable, <code class="docutils literal notranslate"><span class="pre">illico</span></code> scales reasonably with your compute budget. Tests showed that spanning 16 threads brings a 14-fold speedup over spanning 1 single thread.</p></li> |
| 252 | +<li><p>💾 Out-of-core: <code class="docutils literal notranslate"><span class="pre">illico</span></code> supports h5-based, on-disk-backed, dense, CSC and CSR datasets natively.</p></li> |
253 | 253 | <li><p>🎆 All-purpose: <code class="docutils literal notranslate"><span class="pre">illico</span></code> performs both one-versus-reference (useful for perturbation analyses) and one-versus-rest (useful for clustering analyses) wilcoxon rank-sum tests, both equally optimized and fast.</p></li> |
254 | 254 | </ol> |
255 | 255 | <p>Approximate speed benchmarks ran on k562-essential can be found in the Benchmarks section. All the code used to generate those numbers can be found in <code class="docutils literal notranslate"><span class="pre">tests/test_asymptotic_wilcoxon.py::test_speed_benchmark</span></code>.</p> |
|
0 commit comments