`make_arviz` relies on Tasccoda instance state, causing errors when processing multiple models

Hi,

First of all, thank you for this fantastic tool. I've been using the `Tasccoda` module for differential abundance analysis across multiple datasets (different regions) in a loop. I store the resulting `MuData` objects, which contain the MCMC results, in a dictionary for later analysis and plotting.

**The Problem**

I've encountered an issue when trying to generate ArviZ plots in a separate step after the modeling is complete. The `coda.make_arviz(data)` function fails with a matrix dimension mismatch error (`TypeError: dot_general requires contracting dimensions...`).

After investigating, I realized that `make_arviz` does not solely rely on the `MuData` object passed to it. It also uses the internal state of the `Tasccoda` instance (e.g., `self.mcmc`) which was configured by the last call to `coda.prepare()`.

This leads to a "desynchronization" problem: if I try to extract `arviz` data from a `MuData` object from a previous iteration in the loop, `make_arviz` will use the model configuration from the *last* iteration, causing a shape mismatch if the number of cell types (which usually happens as different regions have different cell populations).

**Example workflow to illustrate:**

```python
coda = pt.tl.Tasccoda()
processed_models = {}
datasets = {"region_A": data_A, "region_B": data_B} # Example datasets

for name, mdata in datasets.items():
    coda.prepare(mdata, ...)
    coda.run_nuts(mdata, ...)
    processed_models[name] = mdata

# This call fails if region_A has a different number of cell types than region_B
arviz_data = coda.make_arviz(data=processed_models["region_A"]) 
```

**Workaround**

The current workaround is to "resynchronize" the `Tasccoda` instance by calling `coda.prepare()` again with the specific `MuData` object right before calling `make_arviz`.

```python
# Workaround
coda.prepare(processed_models["region_A"], ...) # Re-run prepare
arviz_data = coda.make_arviz(data=processed_models["region_A"]) # Now it works
```

While this works, it's not very intuitive and makes the code more error-prone.

**Suggestion for Improvement**

Would it be technically feasible to make `make_arviz` a stateless function, or at least have it derive all necessary model information directly from the provided `MuData` object? The `MuData` object already stores the MCMC results and the parameters in `uns['scCODA_params']`. If the model structure could also be inferred or retrieved from the `MuData` object, it would make the function much more robust and self-contained.

This would align with the expectation that the `MuData` object is the single source of truth for a given analysis, making batch processing and later analysis much more straightforward.

Thank you for considering this suggestion!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`make_arviz` relies on Tasccoda instance state, causing errors when processing multiple models #812

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

make_arviz relies on Tasccoda instance state, causing errors when processing multiple models #812

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`make_arviz` relies on Tasccoda instance state, causing errors when processing multiple models #812