Skip to content

ClusterAutoK.fit IndexError on non-consecutive n_clusters list #117

@PFoley-Seq

Description

@PFoley-Seq

CellCharter 0.3.7 `cc.tl.ClusterAutoK.fit` raises `IndexError` when `n_clusters` is a non-consecutive integer list (e.g. `[5, 10, 15, 20]`).

Reproduction

```python
import anndata as ad, numpy as np, cellcharter as cc
adata = ad.AnnData(X=np.random.RandomState(0).randn(500, 20).astype(np.float32))
adata.obsm["X_cellcharter"] = adata.X
autok = cc.tl.ClusterAutoK(n_clusters=[5, 10, 15, 20], max_runs=5, model_params={"random_state": 0})
autok.fit(adata, use_rep="X_cellcharter")
```

Triggers (paraphrased):

```
File ".../cellcharter/tl/_autok.py", line 47, in fit
(new_labels[k], self.labels[k + 1][i])
IndexError: list index out of range
```

The code at `_autok.py:47` accesses `self.labels[k + 1][i]` while iterating `k` over the `n_clusters` list. This assumes `k+1` is also a key — i.e. that `n_clusters` is consecutive.

Suggested fix

Either:

  • (a) document that `n_clusters` must be consecutive when `max_runs > 1`, or
  • (b) refactor `fit` to compare adjacent items in `n_clusters` rather than `k+1`. Something like:

```python
for prev_k, k in zip(self.n_clusters_list[:-1], self.n_clusters_list[1:]):
...
self.labels[k][i] # already works
self.labels[prev_k][i]
```

Workaround

Use a consecutive list (`[8, 9, 10, 11, 12]`), or pass a tuple `(lo, hi)` (CellCharter expands a tuple into a consecutive range internally).

Environment

  • cellcharter 0.3.7
  • Python 3.13.12
  • torchgmm via cellcharter

Happy to send a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions