-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Hi,
First of all, thank you for this fantastic tool. I've been using the Tasccoda
module for differential abundance analysis across multiple datasets (different regions) in a loop. I store the resulting MuData
objects, which contain the MCMC results, in a dictionary for later analysis and plotting.
The Problem
I've encountered an issue when trying to generate ArviZ plots in a separate step after the modeling is complete. The coda.make_arviz(data)
function fails with a matrix dimension mismatch error (TypeError: dot_general requires contracting dimensions...
).
After investigating, I realized that make_arviz
does not solely rely on the MuData
object passed to it. It also uses the internal state of the Tasccoda
instance (e.g., self.mcmc
) which was configured by the last call to coda.prepare()
.
This leads to a "desynchronization" problem: if I try to extract arviz
data from a MuData
object from a previous iteration in the loop, make_arviz
will use the model configuration from the last iteration, causing a shape mismatch if the number of cell types (which usually happens as different regions have different cell populations).
Example workflow to illustrate:
coda = pt.tl.Tasccoda()
processed_models = {}
datasets = {"region_A": data_A, "region_B": data_B} # Example datasets
for name, mdata in datasets.items():
coda.prepare(mdata, ...)
coda.run_nuts(mdata, ...)
processed_models[name] = mdata
# This call fails if region_A has a different number of cell types than region_B
arviz_data = coda.make_arviz(data=processed_models["region_A"])
Workaround
The current workaround is to "resynchronize" the Tasccoda
instance by calling coda.prepare()
again with the specific MuData
object right before calling make_arviz
.
# Workaround
coda.prepare(processed_models["region_A"], ...) # Re-run prepare
arviz_data = coda.make_arviz(data=processed_models["region_A"]) # Now it works
While this works, it's not very intuitive and makes the code more error-prone.
Suggestion for Improvement
Would it be technically feasible to make make_arviz
a stateless function, or at least have it derive all necessary model information directly from the provided MuData
object? The MuData
object already stores the MCMC results and the parameters in uns['scCODA_params']
. If the model structure could also be inferred or retrieved from the MuData
object, it would make the function much more robust and self-contained.
This would align with the expectation that the MuData
object is the single source of truth for a given analysis, making batch processing and later analysis much more straightforward.
Thank you for considering this suggestion!