Support for compositional data (e.g., genomic next generation sequencing counts tables) #450

jolespin · 2025-06-30T17:13:25Z

jolespin
Jun 30, 2025

I'm looking into compositionally-valid causal inference methodologies that can be applied to timeseries. Are there any methods that you have stumbled across that would be appropriate?

I've found directed acyclic graphs used for compositional data in causal inference networks: https://academic.oup.com/ije/article/49/4/1307/5802547

Also, reciprocal log ratios that strengthen convergent cross mapping:
https://www.biorxiv.org/content/10.1101/2021.01.25.428037v1.full

A method is considered “compositionally-valid” when it is specifically designed to account for the unique properties of compositional data—data where each observation is a vector of non-negative values that sum to a fixed amount, such as next generation sequencing (NGS) count tables. In these tables, the absolute counts of each feature (like genes or microbes) depend on the sequencing depth, which is arbitrary and can vary between samples. Instead of raw counts, what matters biologically are the relative proportions or abundances of these features. Compositionally-valid methods, such as those involving log-ratio transformations (centered or isometric) or Aitchison distance, address this by analyzing the data in a transformed space where standard statistical and causal inference techniques can be safely applied. This approach prevents spurious results that can arise when traditional methods are used, since changes in one feature’s abundance necessarily affect the proportions of all others due to the constant-sum constraint. By ensuring that analyses are conducted on relative rather than absolute values, compositionally-valid methods are robust to differences in sequencing depth and enable more accurate, biologically meaningful conclusions, which is especially important for time series causal inference across heterogeneous sample.

@jakobrunge What I've been doing is doing a Center Log-Ratio transform of my data before I use TIGRAMITE but ideally there could be a method that is designed specifically for this type of data if available.

urmininad · 2025-07-14T13:50:12Z

urmininad
Jul 14, 2025

Thanks for your query and details on the background of your problem!
Indeed, the causal inference literature tailored towards compositional data is sparse, and that combined with time series data in particular is nearly non-existent. We look forward to extending tigramite with this functionality in the near future.
Until then, here are few other sources that might help:

Formalizing and axiomatizing causal inference with constraints: https://proceedings.mlr.press/v213/beckers23a/beckers23a.pdf
Application of compositional data to networks in ecology: Here the aim is to learn a network graph (not necessarily causal) among variables that are compositional (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004226). This requires considering transforms of the compositional data that preserve conditional independence structures and uses the neighborhood selection framework introduced here: https://projecteuclid.org/journals/annals-of-statistics/volume-34/issue-3/High-dimensional-graphs-and-variable-selection-with-the-Lasso/10.1214/009053606000000281.full.
Finally, for completeness, here are the two sources that you mentioned in your post:
Causal effect estimation in compositional data: https://academic.oup.com/ije/article/49/4/1307/5802547?login=true Here the authors formalize the compositional data constraint as a collider structure, and classify two types of effects (relative and joint) that could be considered.
Dynamical systems and compositional data (https://www.biorxiv.org/content/10.1101/2021.01.25.428037v1.full):  Here the aim is to discover causal structure of the underlying (compositional) time series using convergent cross mapping. The authors propose a heuristic “reciprocal” log ratio transformation of the data as a pre-processing step, although a proper theoretical justification isn’t provided.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support for compositional data (e.g., genomic next generation sequencing counts tables) #450

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Support for compositional data (e.g., genomic next generation sequencing counts tables) #450

Uh oh!

jolespin Jun 30, 2025

Replies: 1 comment

Uh oh!

urmininad Jul 14, 2025

jolespin
Jun 30, 2025

urmininad
Jul 14, 2025