Hello,
I runned bakta in my seawater metagenomic samples and ended up with some 'genes' that are 70000 bp long. I know that this is imposible of course but does someone know why this would happen? And is there is a way to fix it?
Is there any improvements on my code that I can use to optimice the use of bakta in metagenomic samples??
#!/bin/bash
#
#SBATCH -o logs/bakta/bakta.log_%A_%a.out
#SBATCH -e logs/bakta/bakta.log_%A_%a.err
#SBATCH --mail-type END
#SBATCH --array=1-19%5
#SBATCH --cpus-per-task 48
#SBATCH --mem 150GB
#SBATCH -p long
cd /shared/projects/mags_margo/FL
module load bakta/1.12.0
module load seqtk/1.3
module load seqkit/2.9.0
INPUT=output/nf_core_mag/Assembly/MEGAHIT
OUTPUT=output/03.1_bakta
SAMPLEID=$(awk "NR==${SLURM_ARRAY_TASK_ID}" information/samplelist.txt)
#Pre-filtering the smaller contigs (<500 bp)
seqtk seq -L 500 ${INPUT}/MEGAHIT-${SAMPLEID}.contigs.fa > ${INPUT}/MEGAHIT-${SAMPLEID}_500bp.contigs.fa
#creating the basic stats for the assembly
seqkit stats ${INPUT}/MEGAHIT-${SAMPLEID}_500bp.contigs.fa > ${INPUT}/QC/MEGAHIT-${SAMPLEID}_500bp.txt
#Actual annotation step
bakta -d /shared/bank/bakta/6.0/full/db/ \
--meta \
-o ${OUTPUT}/${SAMPLEID} \
-p ${SAMPLEID}\
-t 48 \
--skip-plot \
--locus-tag ${SAMPLEID} \
--force \
${INPUT}/MEGAHIT-${SAMPLEID}_500bp.contigs.fa
Thank you in advance or any help!
Hello,
I runned bakta in my seawater metagenomic samples and ended up with some 'genes' that are 70000 bp long. I know that this is imposible of course but does someone know why this would happen? And is there is a way to fix it?
Is there any improvements on my code that I can use to optimice the use of bakta in metagenomic samples??
Thank you in advance or any help!