Metagenome mode predicts genes 70000 bp long

Hello, 

I runned bakta in my seawater metagenomic samples and ended up with some 'genes' that are 70000 bp long. I know that this is imposible of course but does someone know why this would happen?  And is there is a way to fix it? 

Is there any improvements on my code that I can use to optimice the use of bakta in metagenomic samples?? 

```
#!/bin/bash
#
#SBATCH -o logs/bakta/bakta.log_%A_%a.out
#SBATCH -e logs/bakta/bakta.log_%A_%a.err
#SBATCH --mail-type END
#SBATCH --array=1-19%5
#SBATCH --cpus-per-task 48
#SBATCH --mem 150GB
#SBATCH -p long

cd /shared/projects/mags_margo/FL
module load bakta/1.12.0   
module load seqtk/1.3
module load seqkit/2.9.0


INPUT=output/nf_core_mag/Assembly/MEGAHIT
OUTPUT=output/03.1_bakta

SAMPLEID=$(awk "NR==${SLURM_ARRAY_TASK_ID}" information/samplelist.txt)

#Pre-filtering the smaller contigs (<500 bp) 

seqtk seq -L 500 ${INPUT}/MEGAHIT-${SAMPLEID}.contigs.fa > ${INPUT}/MEGAHIT-${SAMPLEID}_500bp.contigs.fa

#creating the basic stats for the assembly 

seqkit stats ${INPUT}/MEGAHIT-${SAMPLEID}_500bp.contigs.fa > ${INPUT}/QC/MEGAHIT-${SAMPLEID}_500bp.txt

#Actual annotation step 

bakta -d /shared/bank/bakta/6.0/full/db/ \
	--meta \
	-o ${OUTPUT}/${SAMPLEID} \
	-p ${SAMPLEID}\
	-t 48 \
	--skip-plot \
	--locus-tag ${SAMPLEID} \
	--force \
	${INPUT}/MEGAHIT-${SAMPLEID}_500bp.contigs.fa

```

Thank you in advance or any help! 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metagenome mode predicts genes 70000 bp long #439

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metagenome mode predicts genes 70000 bp long #439

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions