Skip to content

make_arviz relies on Tasccoda instance state, causing errors when processing multiple models #812

@JuanTejedor

Description

@JuanTejedor

Hi,

First of all, thank you for this fantastic tool. I've been using the Tasccoda module for differential abundance analysis across multiple datasets (different regions) in a loop. I store the resulting MuData objects, which contain the MCMC results, in a dictionary for later analysis and plotting.

The Problem

I've encountered an issue when trying to generate ArviZ plots in a separate step after the modeling is complete. The coda.make_arviz(data) function fails with a matrix dimension mismatch error (TypeError: dot_general requires contracting dimensions...).

After investigating, I realized that make_arviz does not solely rely on the MuData object passed to it. It also uses the internal state of the Tasccoda instance (e.g., self.mcmc) which was configured by the last call to coda.prepare().

This leads to a "desynchronization" problem: if I try to extract arviz data from a MuData object from a previous iteration in the loop, make_arviz will use the model configuration from the last iteration, causing a shape mismatch if the number of cell types (which usually happens as different regions have different cell populations).

Example workflow to illustrate:

coda = pt.tl.Tasccoda()
processed_models = {}
datasets = {"region_A": data_A, "region_B": data_B} # Example datasets

for name, mdata in datasets.items():
    coda.prepare(mdata, ...)
    coda.run_nuts(mdata, ...)
    processed_models[name] = mdata

# This call fails if region_A has a different number of cell types than region_B
arviz_data = coda.make_arviz(data=processed_models["region_A"]) 

Workaround

The current workaround is to "resynchronize" the Tasccoda instance by calling coda.prepare() again with the specific MuData object right before calling make_arviz.

# Workaround
coda.prepare(processed_models["region_A"], ...) # Re-run prepare
arviz_data = coda.make_arviz(data=processed_models["region_A"]) # Now it works

While this works, it's not very intuitive and makes the code more error-prone.

Suggestion for Improvement

Would it be technically feasible to make make_arviz a stateless function, or at least have it derive all necessary model information directly from the provided MuData object? The MuData object already stores the MCMC results and the parameters in uns['scCODA_params']. If the model structure could also be inferred or retrieved from the MuData object, it would make the function much more robust and self-contained.

This would align with the expectation that the MuData object is the single source of truth for a given analysis, making batch processing and later analysis much more straightforward.

Thank you for considering this suggestion!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions