Skip to content

Commit 47f3c4b

Browse files
committed
Add new CHAMOIS paper
1 parent e0df305 commit 47f3c4b

1 file changed

Lines changed: 41 additions & 0 deletions

File tree

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
layout: paper
3+
title: "Machine learning inference of natural product chemistry across biosynthetic gene cluster types"
4+
nickname: 2025-03-15-larralde-machine-learning-inference
5+
authors: "Larralde M, Zeller G"
6+
year: "2025"
7+
journal: "biorxiv"
8+
volume:
9+
issue:
10+
pages:
11+
is_published: false
12+
image: /assets/images/papers/biorxiv.png
13+
projects: ["secmet", "function"]
14+
tags: ["preprint"]
15+
16+
# Text
17+
fulltext:
18+
pdf:
19+
pdflink:
20+
pmcid:
21+
preprint: https://doi.org/10.1101/2025.03.13.642868
22+
supplement:
23+
24+
# Links
25+
doi: "10.1101/2025.03.13.642868"
26+
pmid:
27+
28+
# Data and code
29+
github: https://github.com/zellerlab/CHAMOIS
30+
neurovault:
31+
openneuro:
32+
figshare:
33+
figshare_names:
34+
osf:
35+
zenodo: 15009032
36+
---
37+
{% include JB/setup %}
38+
39+
# Abstract
40+
41+
With ever-increasing volumes of sequencing data for biosynthetic gene clusters (BGCs), computational methods to accurately predict which secondary metabolites result from these are critically lacking. Here, we present CHAMOIS, a machine learning-based tool for predicting chemical properties of secondary metabolites from protein domains annotated in the input BGCs. CHAMOIS infers 485 chemical properties from the ChemOnt ontology using logistic regression. It accurately predicts 111 such properties (AUPRC > 0.5) in cross-validation against known instances. Although CHAMOIS is not explicitly trained on biosynthetic knowledge, many of the inferred links between protein domains and metabolite properties are consistent with scientific literature, others suggest new biochemical functions of uncharacterized biosynthetic domains. Finally, CHAMOIS can pinpoint which BGC within a given genome produces a pre-specified metabolite (correct BGC in 69% of cases ranked among the top 5), which holds great potential for prioritising experimental BGC characterisation and discovery of novel biosynthetic enzymes.

0 commit comments

Comments
 (0)