This extension can be used to perform functional mapping, i.e. mapping metagenomic reads to proteins. The database to map against could be UniRef50, all prokaryotic proteins from KEGG, or more targeted databases, e.g. bai operon or butyrate producing genes.
Take UniRef50 database as an example. First download the uniref50.fasta into your current sunbeam_output/mapping/sbx_gene_family/databases/.
mkdir -p sunbeam_output/mapping/sbx_gene_family/database/
wget ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz -P sunbeam_output/mapping/sbx_gene_family/database/Second, update the config.yml with the proper path.
With you sunbeam conda environemnt activated,
- Clone into your Sunbeam extensions directory:
git clone https://github.com/sunbeam-labs/sbx_gene_clusters- Add the new config options to your config file
cat sunbeam/extensions/sbx_gene_clusters/config.yml >> sunbeam_config.yml- Install the requirements:
conda install --file extensions/sbx_gene_clusters/requirements.txt- Run time
By default, mapping uses DIAMOND, but this extension also supports using BLAST.
sunbeam run -- --configfile sunbeam_config.yml all_gene_family