Skip to content

Metagenome mode predicts genes 70000 bp long #439

@micro-hg-cpg

Description

@micro-hg-cpg

Hello,

I runned bakta in my seawater metagenomic samples and ended up with some 'genes' that are 70000 bp long. I know that this is imposible of course but does someone know why this would happen? And is there is a way to fix it?

Is there any improvements on my code that I can use to optimice the use of bakta in metagenomic samples??

#!/bin/bash
#
#SBATCH -o logs/bakta/bakta.log_%A_%a.out
#SBATCH -e logs/bakta/bakta.log_%A_%a.err
#SBATCH --mail-type END
#SBATCH --array=1-19%5
#SBATCH --cpus-per-task 48
#SBATCH --mem 150GB
#SBATCH -p long

cd /shared/projects/mags_margo/FL
module load bakta/1.12.0   
module load seqtk/1.3
module load seqkit/2.9.0


INPUT=output/nf_core_mag/Assembly/MEGAHIT
OUTPUT=output/03.1_bakta

SAMPLEID=$(awk "NR==${SLURM_ARRAY_TASK_ID}" information/samplelist.txt)

#Pre-filtering the smaller contigs (<500 bp) 

seqtk seq -L 500 ${INPUT}/MEGAHIT-${SAMPLEID}.contigs.fa > ${INPUT}/MEGAHIT-${SAMPLEID}_500bp.contigs.fa

#creating the basic stats for the assembly 

seqkit stats ${INPUT}/MEGAHIT-${SAMPLEID}_500bp.contigs.fa > ${INPUT}/QC/MEGAHIT-${SAMPLEID}_500bp.txt

#Actual annotation step 

bakta -d /shared/bank/bakta/6.0/full/db/ \
	--meta \
	-o ${OUTPUT}/${SAMPLEID} \
	-p ${SAMPLEID}\
	-t 48 \
	--skip-plot \
	--locus-tag ${SAMPLEID} \
	--force \
	${INPUT}/MEGAHIT-${SAMPLEID}_500bp.contigs.fa

Thank you in advance or any help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions