Set your variable variable and export them for your environment
THREADS=3
DATADIR=~ /Share/GenomeAnnWorkshopDATA
SWISSPROTDB=$DATADIR /uniprot_sprot
- These are the variables that needed to be exported to annotate the Arabidopsis taliana genome, I suggest to not change them.
SPECIES=Atal
BUSCODIR=$DATADIR
DIR=Tutorial.Arabidopsis
srSRAS=' SRR5956435 SRR5956436'
longREADS=' SRR23291376 SRR23291378'
LONGTYPE=pacbio
ASSEMBLY=$DATADIR /GCF_000001735.4_TAIR10.1_genomic.fna
TARGETCHR=' NC_003070.9'
CHRNAME=chr1
CHR=$DATADIR /$SPECIES .$CHRNAME .fasta
ANNREF=$DATADIR /$SPECIES .$CHRNAME .gene.ncbi.gff
BRAKERGTF=$DATADIR /$SPECIES .braker_utr.$CHRNAME .gff3
STRAND=' --rf'
LIBTYPE=' first'
BUSCODB=embryophyta_odb12
MIKADOSCORE=$DATADIR /plants.yaml
These are the BUSCO profiles for the chromosome you're going to annotate:
Results from dataset embryophyta_odb10: proteome
C:27.1%[S:14.4%,D:12.7%],F:0.3%,M:72.6%,n:1614
437 Complete BUSCOs (C)
232 Complete and single-copy BUSCOs (S)
205 Complete and duplicated BUSCOs (D)
5 Fragmented BUSCOs (F)
1172 Missing BUSCOs (M)
1614 Total BUSCO groups searched
And this is for the reference annotation:
Results from dataset embryophyta_odb10: Chr1
S:27.01%,D:0.19%,F:0.93%,I:0.00%,M:71.87%,n:1614
436 Complete and single-copy BUSCOs (S)
3 Complete and duplicated BUSCOs (D)
0 Incomplete (I)
15 Fragmented BUSCOs (F)
1160 Missing BUSCOs (M)
1614 Total BUSCO groups searched
You can use these stats as a comparison to evaluate the annotations you're going to generate.
- These are the variables that needed to be exported to annotate the Entomobrya proxima genome not to change.
SPECIES=Epro
BUSCODIR=$DATADIR
DIR=Tutorial.Springtail
srSRAS=' SRR15910090'
longREADS=' SRR15910089'
LONGTYPE=pacbio
ASSEMBLY=$DATADIR /GCA_029691765.1_ASM2969176v1_genomic.fna
TARGETCHR=' CM056168.1'
CHRNAME=chr1
CHR=$DATADIR /$SPECIES .$CHRNAME .fasta
ANNREF=$DATADIR /$SPECIES .$CHRNAME .gene.maker.gff
BRAKERGTF=$DATADIR /$SPECIES .braker_utr.$CHRNAME .gff3
STRAND=' --rf'
LIBTYPE=' first'
BUSCODB=arthropoda_odb12
MIKADOSCORE=$DATADIR /insects.yaml
Results from dataset arthropoda_odb10: proteome
C:32.1%[S:28.9%,D:3.2%],F:1.9%,M:66.0%,n:1013
325 Complete BUSCOs (C)
293 Complete and single-copy BUSCOs (S)
32 Complete and duplicated BUSCOs (D)
19 Fragmented BUSCOs (F)
669 Missing BUSCOs (M)
1013 Total BUSCO groups searched
Results from dataset arthropoda_odb10: Chr1
S:30.40%,D:0.10%,F:1.09%,I:0.00%,M:68.41%,n:1013
308 Complete and single-copy BUSCOs (S)
1 Complete and duplicated BUSCOs (D)
0 Incomplete (I)
11 Fragmented BUSCOs (F)
693 Missing BUSCOs (M)
1013 Total BUSCO groups searched
- These are the variables that needed to be exported to annotate the Schistocerca gregaria genome not to change.
SPECIES=Sgre
BUSCODIR=$DATADIR
DIR=Tutorial.DesertLocust
srSRAS=' SRR15423966 SRR15423964'
longREADS=' SRR17978912'
LONGTYPE=pacbio
ASSEMBLY=$DATADIR /GCF_023897955.1_iqSchGreg1.2_genomic.fna
TARGETCHR=' NC_064930.1'
CHRNAME=chr11
CHR=$DATADIR /$SPECIES .$CHRNAME .fasta
ANNREF=$DATADIR /$SPECIES .$CHRNAME .gene.ncbi.gff
BRAKERGTF=$DATADIR /$SPECIES .braker_utr.$CHRNAME .gff3
STRAND=' --rf'
LIBTYPE=' first'
BUSCODB=insecta_odb12
MIKADOSCORE=$DATADIR /insects.yaml
Results from dataset insecta_odb10: proteome
C:1.7%[S:1.0%,D:0.7%],F:0.2%,M:98.1%,n:1367
22 Complete BUSCOs (C)
13 Complete and single-copy BUSCOs (S)
9 Complete and duplicated BUSCOs (D)
3 Fragmented BUSCOs (F)
1342 Missing BUSCOs (M)
1367 Total BUSCO groups searched
Results from dataset insecta_odb10: Chr11
S:1.10%,D:0.00%,F:0.29%,I:0.00%,M:98.61%,n:1367
15 Complete and single-copy BUSCOs (S)
0 Complete and duplicated BUSCOs (D)
0 Incomplete (I)
4 Fragmented BUSCOs (F)
1348 Missing BUSCOs (M)
1367 Total BUSCO groups searched