-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathCITATION.cff
More file actions
90 lines (88 loc) · 3.57 KB
/
CITATION.cff
File metadata and controls
90 lines (88 loc) · 3.57 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
Fast, flexible gene cluster family delineation with IGUA
message: Please cite this software using these metadata.
type: software
authors:
- given-names: Martin
family-names: Larralde
email: martin.larralde@embl.de
affiliation: European Molecular Biology Laboratory
orcid: 'https://orcid.org/0000-0002-3947-4444'
- given-names: Josefin
family-names: Blom
affiliation: Umea University
orcid: https://orcid.org/0009-0000-9910-7374
- given-names: Hadrien
family-names: Gourlé
affiliation: Umea University
orcid: https://orcid.org/0000-0001-9807-1082
- given-names: Laura M.
family-names: Carroll
orcid: https://orcid.org/0000-0002-3677-0192
affiliation: Umea University
- given-names: Georg
family-names: Zeller
affiliation: European Molecular Biology Laboratory
orcid: https://orcid.org/0000-0003-1429-7485
identifiers:
- type: doi
value: 10.1101/2025.05.15.654203v1
description: bioRxiv preprint
repository-code: 'https://github.com/zellerlab/IGUA'
abstract: >-
Prokaryotic genomes harbor a variety of functional elements encoded as
contiguous multi-gene clusters, with biosynthetic gene clusters (BGCs,
genetic determinants of secondary metabolite biosynthesis) serving as a
notable example. In a typical workflow, BGCs are clustered into Gene Cluster
Families (GCFs), units that group BGCs encoding similar biosynthetic pathways
together. However, existing methods cannot readily scale to massive datasets and
cannot be used for GCF delineation tasks beyond BGC clustering. Here, we present
IGUA (Iterative Gene clUster Analysis; https://github.com/zellerlab/IGUA), a
scalable, flexible GCF delineation method for genomic segments with multi-gene
architectures. On a BGC clustering task, IGUA is ≥10x faster than the
state-of-the-art (BiG-SCAPE/BiG-SLiCE), without sacrificing accuracy. To
highlight its scalability, we use IGUA to cluster >2.8 million BGCs from ≈1
million prokaryotic genomes in <18 hours (n = 2,829,071 BGCs to 56,960 GCFs).
To showcase its utility beyond BGC clustering, we use IGUA to cluster (i)
secretion systems and (ii) prophages into GCFs (n = 10,576 and 356,776 gene
clusters to 2,744 and 213,699 GCFs, respectively). Overall, IGUA represents a
versatile GCF delineation tool with unmatched computational efficiency and
flexibility, enabling (meta)genomic mining applications at unprecedented
scales.
keywords:
- gene cluster family
- biosynthetic gene cluster
license: GPL-3.0-or-later
preferred-citation:
type: article
status: preprint
authors:
- given-names: Martin
family-names: Larralde
email: martin.larralde@embl.de
affiliation: European Molecular Biology Laboratory
orcid: 'https://orcid.org/0000-0002-3947-4444'
- given-names: Josefin
family-names: Blom
affiliation: Umea University
orcid: https://orcid.org/0009-0000-9910-7374
- given-names: Hadrien
family-names: Gourlé
affiliation: Umea University
orcid: https://orcid.org/0000-0001-9807-1082
- given-names: Laura M.
family-names: Carroll
orcid: https://orcid.org/0000-0002-3677-0192
affiliation: Umea University
- given-names: Georg
family-names: Zeller
affiliation: European Molecular Biology Laboratory
orcid: https://orcid.org/0000-0003-1429-7485
doi: "10.1101/2025.03.13.642868"
journal: "bioRxiv"
month: 5
title: "Fast, flexible gene cluster family delineation with IGUA"
year: 2025