This is a repository for all code, analysis and data accompanying the Moon & Ahnert paper.
- Run:
conda env create -f vectorisation.yml
conda activate vectorisation_env
- Download the data folder from [here] and add to this directory.
- Then run Jupyter notebooks to do analysis and generate figures into the
figuresfolder.
Note:
- Requires
Graphvizto be installed fordotlayout innetworkx(see here) - theymlfile should handle this.
Please download from the Zenodo link above and save into the repository directory.
/gene_association.tairis the gene ontology (GO) annotations associated with each gene downloaded from The Arabidopsis Information Resource (TAIR), Berardini et al., Plant Physiology (2004). Version generated: 2025-01-01./AtRegNet.csvis the gene regulatory network of the Arabidopsis Thalania, available from agris, Palaniswamy et al., Plant Physiology (2006). Version dated: 2019-03-11./tf_family.csvTranscription factor gene list for family annotation metadata from AtTFDB in agris, Palaniswamy et al., Plant Physiology (2006)./ATH_GO_GOSLIM.txtis the gene ontology annotation explanatory text file, downloaded from The Arabidopsis Information Resource (TAIR), Berardini et al., Plant Physiology (2004). Version generated: 2025-03-01.
/florida.bare_upto122_fullnamesis a text file of the prey to predator edge list, from Heymans et al., Ecological Modelling, (2002).florida.bare_upto122_fullnames.newcat.termsis a text file of the organism and organism type annotations, from Heymans et al., Ecological Modelling, (2002).
/cuisine_recipeis a text file of cuisines types and the recipe IDs that belong to the cuisine, from allrecipes recipe database, via Ahn et al., Scientific Reports, (2011)./recipe_ingredients_bipartiteis a text file containing the edge list of recipe IDs to cuisine labels, from allrecipes recipe database, via Ahn et al., Scientific Reports, (2011)./foodtype_categorised.csvis a csv of manual categorisation of ingredient type./cuisine_geo_labels.csvis a csv of manual cuisine categorisation into geographical regions of origin.
- The full
.csvfile of the enriched clusters, their TF family labels, and the GO slim descriptors can be found ingrn_GO_enrichment.csv, (also indata/grn/grn_GO_enrichment.csv).
ipynbnotebooks are here. Recommended to run in order of numbering.00_visualisation.ipynbvisualises the multipartite recipe network. Figure 3C.01_vectorisation.ipynbperforms the connectivity aggregation and vectorisation for the three networks. Saves these vectors (optional) into aprocessedsubfolder indata/*/.02_clustering.ipynbclusters and visualises these clusters into a dendrogram and associated vector heatmap. Figures 2(A,B), 3(A,B), 4(A), 5, S1, S2, S3.- For the gene regulatory network, it performs the enrichment analysis for the clusters of transcription factors.
03_null_models_nb.ipynbincludes null models of the specialization-diversity entropic vector euclidean pairwise distances for cell types, serial homologues and left-right pairs. Figures 2D, 3D, 4B, S4.XX_toy_model.ipynbIncludes the visualisation of the vectorisation method on a toy model network. Figure 1.
- Scripts and functions to run the analysis:
fvec.py: a fast vectorisation routine.graphpeeler.py: helpers for probabilistic topological layer sorting.enrichement_utils.py: helpers for performing GO term enrichment analysis.data_prep.py: preparation of data from raw sources.null_helpers.py: null model wrappers used insrc/03_null_models.ipynb.
- (Sub)figures are organised into their respective network folders.
- Fully annotated clustering figures are found in:
/fw/f3a_dendrogram_full.pdffor the food web network./rn/f4a_dendrogram_full.pdffor the recipe network./grn/s1_TF_GRN_GO_clustering_BH_all_GO_annotated.pdffor the gene regulatory network.
References
Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P., Barab´asi, A.-L., (2011) Flavor network and the principles of food pairing, Scientific Reports 1, 196
Berardini, T.Z., Mundodi, S., Reiser, R., Huala, E., Garcia-Hernandez, M., Zhang, P., Mueller, L.M., Yoon, J., Doyle, A., Lander, G., Moseyko, N., Yoo, D., Xu, I., Zoeckler, B., Montoya, M., Miller, N., Weems, D., Rhee, S.Y. (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiology 135(2):745–755.
Heymans, J.J., Ulanowicz, R.E., Bondavalli, C., (2003) Network analysis of the South Florida Everglades graminoid marshes and comparison with nearby cypress ecosystems, Ecological Modelling 149(1–2):5-23.
Palaniswamy, S.K., James, S., Sun, H., Lamb, R.S., Davuluri, R.V., Grotewold, E. (2006) AGRIS and AtRegNet: A platform to link cis-regulatory elements and transcription factors into regulatory networks. Plant Physiology, 140(3):818-829.