Skip to content

Commit 07055a9

Browse files
authored
Merge pull request #7 from remydubois/fix/readme-and-action
Fixed reference_group, enhanced readme
2 parents d2456df + 99deee3 commit 07055a9

7 files changed

Lines changed: 26 additions & 18 deletions

File tree

.github/workflows/publish-package.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,4 @@ jobs:
2626
- name: Build package
2727
run: python -m poetry build
2828
- name: Publish package
29-
run: python -m poetry publish -u __token__ -p ${{ secrets.PYPI_TOKEN }}
29+
run: python -m poetry publish -u __token__ -p ${{ secrets.PYPI_TOKEN }}

.github/workflows/python-package.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
name: Python package
55

66
on:
7-
push:
8-
branches: [ "main" ]
7+
# push:
8+
# branches: [ "main" ]
99
pull_request:
1010
branches: [ "main" ]
1111

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ Copyright 2025 Rémy Dubois
1010
distributed under the License is distributed on an "AS IS" BASIS,
1111
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
See the License for the specific language governing permissions and
13-
limitations under the License.
13+
limitations under the License.

README.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Approximate speed benchmarks ran on k562-essential can be found below. All the c
2626
4. This package is not intended at running out-of-core single cell data analyses like `rapids-singlecell`.
2727

2828
## Installation
29-
`illico` can be installed via pip, compatible with Python 3.12 and onward:
29+
`illico` can be installed via pip, compatible with Python 3.11 and onward:
3030
```bash
3131
pip install illico -U
3232
```
@@ -75,10 +75,18 @@ scanpy_port_asymptotic_wilcoxon(adata, group_keys="perturbation", reference="non
7575
`illico` relies on a few optimization tricks to be faster than other existing tools. It is very possible that for some reason, the specific layout of your dataset (very small control population, very low sparsity, very small amount of distinct values) result in those tricks being effect-less, or less effective than observed on the datasets used to develop & benchmark `illico`. It is also very possible that because of those, other solutions end up faster than `illico` ! If this is your case, please open a issue describing your situation.
7676

7777
### `illico`'s results (p-values or fold-change) does not match `pdex` or `scanpy`.
78+
#### Test results (p-values)
7879
Please open an issue, but before that: make sure that you are running **asymptotic** wilcoxon rank-sum tests as this is the only test exposed by `illico`.
7980
- `pdex` relies on `scipy.stats.mannwhitneyu` that runs exact (non asymptotic) only when there are 8 values in both groups combined, and no ties.
8081
- `scanpy` offers the possibility to run non-tie-corrected wilcoxon rank-sum tests, make sure this is disabled by passing `tie_correct=True`.
81-
- Also, `illico` uses continuity correction which is the best practice.
82+
- Also, `illico` uses continuity correction by default which is the best practice.
83+
84+
The test suite implemented in the CI and used to develop `illico` targets a precision of 1.e-12 compared to `scipy`, not `scanpy`. Consequently, there **will be** slight disagreement between `scanpy`'s p-values and `illico`'s p-values.
85+
86+
#### Fold-change
87+
The fold-change computed by illico is the most naive form of the fold-change:
88+
$$\text{fold-change} = \frac{E[X_{\text{perturbed}}]}{E[X_{\text{control}}]}$$
89+
If your data underwent log1p transform, `np.expm1` is applied **before** computing the expectations (means). I know many definitions exist, and adding more control over this should not be complicated. If this is your case, please open an issue.
8290

8391
### What about normalization and log1p
8492
1. `illico` does not care about your data being normalized or not, it is up to you to apply the preprocessing of your choice before running the tests. It is expected that `illico` is slower if ran on total-count normalized data by a factor ~2. This is because if applied on non total-count normalized data, sorting relies on radix sort which is faster than the usual quicksort (that is used if testing total-count normalized data).

illico/asymptotic_wilcoxon.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def asymptotic_wilcoxon(
2525
adata: ad.AnnData,
2626
is_log1p: bool,
2727
group_keys: str,
28-
reference_group: str | None = None,
28+
reference: str | None = None,
2929
n_threads: int = 1,
3030
batch_size: int = 256,
3131
alternative: str = "two-sided",
@@ -50,7 +50,7 @@ def asymptotic_wilcoxon(
5050
Whether the data is log1p transformed.
5151
group_keys
5252
Key in `adata.obs` specifying the group variable.
53-
reference_group
53+
reference
5454
Name of the reference group for OVO tests. If `None`, OVR tests are performed.
5555
n_threads
5656
Number of threads to use for parallel computation.
@@ -92,13 +92,13 @@ def asymptotic_wilcoxon(
9292
)
9393

9494
if precompile:
95-
_precompile(X, reference_group)
95+
_precompile(X, reference)
9696

9797
# Process the groups information
9898
raw_groups = adata.obs[group_keys].tolist()
99-
unique_raw_groups, group_container = encode_and_count_groups(groups=raw_groups, ref_group=reference_group)
99+
unique_raw_groups, group_container = encode_and_count_groups(groups=raw_groups, ref_group=reference)
100100
logger.info(
101-
f"Found {group_container.counts.size} unique groups (min size: {group_container.counts.min()} cells; max size: {group_container.counts.max()} cells), with reference group: {reference_group}"
101+
f"Found {group_container.counts.size} unique groups (min size: {group_container.counts.min()} cells; max size: {group_container.counts.max()} cells), with reference group: {reference}"
102102
)
103103
_, n_genes = X.shape
104104

@@ -121,7 +121,7 @@ def asymptotic_wilcoxon(
121121
logger.trace(f"Performing a total of {n_tests:,d} tests.")
122122
with Parallel(n_threads, prefer="threads", return_as="generator_unordered") as pool:
123123
with tqdm(total=n_tests, smoothing=0.0, unit="it", unit_scale=True, unit_divisor=1000) as pbar:
124-
if reference_group is None: # ovr use case
124+
if reference is None: # ovr use case
125125
pbar.set_description("Running one-versus-all MannWhitney-U tests")
126126
op = delayed(lambda *args: (ovr_mwu_over_col_contiguous_chunk(*args), args))
127127
else: # ovo use case

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "illico"
3-
version = "0.1.0"
3+
version = "0.1.1"
44
description = "Fast asymptotic mannwhitney-u test"
55
authors = [
66
{name = "remydubois",email = "remydubois14@gmail.com"}

tests/test_asymptotic_wilcoxon.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ def test_asymptotic_wilcoxon(rand_adata, test, use_continuity, alternative):
118118
adata=rand_adata,
119119
is_log1p=False,
120120
group_keys="pert",
121-
reference_group=reference,
121+
reference=reference,
122122
use_continuity=use_continuity,
123123
n_threads=1,
124124
batch_size=16,
@@ -177,7 +177,7 @@ def test_unsorted_indices_error(rand_adata):
177177
adata=rand_adata,
178178
is_log1p=False,
179179
group_keys="pert",
180-
reference_group="non-targeting",
180+
reference="non-targeting",
181181
n_threads=1,
182182
batch_size=16,
183183
)
@@ -207,7 +207,7 @@ def run():
207207
data,
208208
is_log1p=False,
209209
group_keys="gene",
210-
reference_group=reference,
210+
reference=reference,
211211
n_threads=num_threads,
212212
batch_size=256,
213213
)
@@ -242,7 +242,7 @@ def test_speed_benchmark(adata, method, test, num_threads, benchmark, request):
242242

243243
# Compile
244244
if method == "illico":
245-
_precompile(adata.X, reference_group="non-targeting" if test == "ovo" else None)
245+
_precompile(adata.X, reference="non-targeting" if test == "ovo" else None)
246246

247247
params = re.match(".*\[(.*)\]", request.node.name).group(1).split("-")
248248
group_params = [p for i, p in enumerate(params) if i in [0, 1, 4]]
@@ -263,7 +263,7 @@ def test_memory_benchmark(adata, method, test, num_threads, request):
263263

264264
# Compile outside of the tracker context
265265
if method == "illico":
266-
_precompile(adata.X, reference_group="non-targeting" if test == "ovo" else None)
266+
_precompile(adata.X, reference="non-targeting" if test == "ovo" else None)
267267

268268
test_params_string = re.match(".*\[(.*)\]", request.node.name).group(1)
269269
outdir = Path(os.environ.get("MEMRAY_RESULTS_DIR") or Path(__file__).parents[1])

0 commit comments

Comments
 (0)