Open
Description
Hi,
I have different InterProScan results between running GAAS and running InterProScan manually, with the same input files.
I do not see any difference in the input arguments. As you know the backend very well maybe you could help me identifying what causes these differences.
What questions are:
- Why can I not see the IPR, Pfam and GO codes/IDs in the merged GFF file?
- Why genes are annotated differently between InterProScan (local install or web) and the install in GAAS?
Running InterProScan within GAAS:
[doutree@plop] $ module load Nextflow
[doutree@plop] $ cat ~/workspace/GFF/functional_annotation_param_chr01.yml
subworkflow: 'functional_annotation'
genome: '~/input/DAUCA_Kuroda_chr01.fa'
gff_annotation: '~/input/Daucus_carota.gene_chr_AGAT_chr01.gff'
blast_db_fasta: '~/input/uniprot_sprot.fasta'
outdir: '~/output/20240408_chr01'
[doutree@plop] $ cat ~/workspace/GFF/custom_config_chr01.txt
process {
withName: 'INTERPROSCAN' {
cpus = 20
memory = 300.GB
ext.args = [
'--iprlookup',
'--goterms',
'-t p',
'-dra',
'-appl TIGRFAM,FunFam,SFLD,PANTHER,Gene3D,Hamap,Coils,SMART,CDD,PRINTS,PIRSR,AntiFam,Pfam'
].join(" ").trim()
}
withName: 'BLAST_BLASTP' {
ext.args = '-max_target_seqs 1 -evalue 1e-6 -outfmt 6'
}
}
[doutree@plop] $ nextflow run NBISweden/pipelines-nextflow -profile conda -params-file functional_annotation_param_chr01.yml -c custom_config_chr01.txt
N E X T F L O W ~ version 22.10.1
Launching \`https://github.com/NBISweden/pipelines-nextflow\` [maniac_spence] DSL2 - revision: 5f66ae3cf2
[master]
_ _ ___ ___ ___
| \| | _ )_ _/ __|
| .` | _ \| |\__ \
|_|\_|___/___|___/ Annotation Service
Functional annotation workflow
===================================================
[f9/834f23] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 1 of 1 ✔
[15/7856b3] process > FUNCTIONAL_ANNOTATION:GFF2P... [100%] 1 of 1 ✔
[3a/fbbf8e] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 6 of 6 ✔
[49/81bd45] process > FUNCTIONAL_ANNOTATION:INTER... [100%] 6 of 6 ✔
[b0/e757df] process > FUNCTIONAL_ANNOTATION:MERGE... [100%] 1 of 1 ✔
Workflow completed successfully.
Thank you for using our workflow.
Results are located in the folder: ~/output/20240408_chr01
Completed at: 08-Apr-2024 16:57:29
Duration : 10m 10s
CPU hours : 5.9
Succeeded : 15
[doutree@plop] $ head Daucus_carota.gene_chr_AGAT_chr01.gff
##gff-version 3
chr01 maker gene 24795 31012 . - . ID=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010
chr01 maker mRNA 24795 31012 . - . ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01 maker exon 24795 24945 . - . ID=NBISE00000000001;Parent=NBISM00000000001;makerName=nbis-exon-1
chr01 maker exon 26435 26604 . - . ID=NBISE00000000002;Parent=NBISM00000000001;makerName=nbis-exon-2
chr01 maker exon 27851 27929 . - . ID=NBISE00000000003;Parent=NBISM00000000001;makerName=nbis-exon-3
chr01 maker exon 28302 28423 . - . ID=NBISE00000000004;Parent=NBISM00000000001;makerName=nbis-exon-4
chr01 maker exon 30953 31012 . - . ID=NBISE00000000005;Parent=NBISM00000000001;makerName=nbis-exon-5
chr01 maker CDS 24795 24945 . - 1 ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-5
chr01 maker CDS 26435 26604 . - 0 ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-4
[doutree@plop] $ grep mRNA Daucus_carota.gene_chr_AGAT_chr01.gff | head
chr01 maker mRNA 24795 31012 . - . ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01 maker mRNA 33922 37446 . + . ID=NBISM00000000002;Parent=NBISG00000000002;makerName=DcarChr1G00000020.1;product=hypothetical protein
chr01 exonerate mRNA 45536 50728 . - . ID=NBISM00000000003;Parent=NBISG00000000003;Name=At5g41760;makerName=DcarChr1G00000030.1;product=CMP-sialic acid transporter 1;uniprot_id=Q8LGE9
chr01 exonerate mRNA 90633 141688 . - . ID=NBISM00000000004;Parent=NBISG00000000004;Name=GIP;makerName=DcarChr1G00000040.1;product=Copia protein;uniprot_id=P04146
chr01 maker mRNA 145015 147063 . + . ID=NBISM00000000005;Parent=NBISG00000000005;Name=NAKR2;makerName=DcarChr1G00000050.1;product=Protein SODIUM POTASSIUM ROOT DEFECTIVE 2;uniprot_id=Q58FZ0
chr01 exonerate mRNA 164172 286235 . + . ID=NBISM00000000006;Parent=NBISG00000000006;Name=GIP;makerName=DcarChr1G00000060.1;product=Copia protein;uniprot_id=P04146
chr01 exonerate mRNA 395432 509234 . - . ID=NBISM00000000007;Parent=NBISG00000000007;Name=GIP;makerName=DcarChr1G00000070.1;product=Copia protein;uniprot_id=P04146
chr01 exonerate mRNA 534035 534211 . - . ID=NBISM00000000008;Parent=NBISG00000000008;makerName=DcarChr1G00000080.1;product=hypothetical protein
chr01 maker mRNA 639615 642189 . + . ID=NBISM00000000009;Parent=NBISG00000000009;makerName=DcarChr1G00000090.1;product=hypothetical protein
chr01 transdecoder mRNA 655131 661114 . + . ID=NBISM00000000010;Parent=NBISG00000000010;Name=GLYR1;makerName=DcarChr1G00000100.1;product=Glyoxylate/succinic semialdehyde reductase 1;uniprot_id=Q9LSV0
Run InterProScan manually:
# Create protein FASTA sequence
[doutree@plop] ~ $ module load AGAT
[doutree@plop] ~ $ agat_sp_extract_sequences.pl -p -cfs -cis -ct 1 --g ~/input/Daucus_carota.gene_chr_AGAT_chr01.gff -f ~/input/DAUCA_Kuroda_chr01.fa -o ~/input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta
# Run InterProScan
[doutree@plop] ~ $ module load InterProScan
[doutree@plop] ~ $ interproscan.sh -version
InterProScan version 5.62-94.0
InterProScan 64-Bit build (requires Java 11)
[doutree@plop] ~ $ interproscan.sh -i ~/input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta -f TSV -b ~/output/Daucus_carota.gene_chr_prot.fasta_interpro
# Merge annotation
[doutree@plop] ~ $ ipr_update_gff ~/input/Daucus_carota.gene_chr_AGAT_chr01.gff ~/output/Daucus_carota.gene_chr_prot.fasta_interpro.tsv > ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff
[doutree@plop] ~ $ head ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff
##gff-version 3
chr01 maker mRNA 24795 31012 . - . ID=DcarChr1G00000010_1;Parent=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01 maker gene 24795 31012 . - . ID=DcarChr1G00000010;Name=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01 maker CDS 24795 24945 . - 1 ID=cds-5;Parent=DcarChr1G00000010_1
chr01 maker exon 24795 24945 . - . ID=nbis-exon-1;Parent=DcarChr1G00000010_1
chr01 maker exon 26435 26604 . - . ID=nbis-exon-2;Parent=DcarChr1G00000010_1
chr01 maker CDS 26435 26604 . - 0 ID=cds-4;Parent=DcarChr1G00000010_1
chr01 maker exon 27851 27929 . - . ID=nbis-exon-3;Parent=DcarChr1G00000010_1
chr01 maker CDS 27851 27929 . - 1 ID=cds-3;Parent=DcarChr1G00000010_1
chr01 maker exon 28302 28423 . - . ID=nbis-exon-4;Parent=DcarChr1G00000010_1
# Please note that we store functional annotation at the gene level so a slight difference here
[doutree@plop] ~ $ grep gene ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff | head
chr01 maker gene 24795 31012 . - . ID=DcarChr1G00000010;Name=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01 maker gene 33922 37446 . + . ID=DcarChr1G00000020;Name=DcarChr1G00000020;Note=Protein of unknown function
chr01 exonerate gene 45536 50728 . - . ID=DcarChr1G00000030;Name=DcarChr1G00000030;Dbxref=InterPro:IPR007271,PFAM:PF04142,SUPERFAMILY:SSF103481,TIGRFAM:TIGR00803;Note=Similar to At5g41760: CMP-sialic acid transporter 1 (Arabidopsis thaliana)
chr01 exonerate gene 90633 141688 . - . ID=DcarChr1G00000040;Name=DcarChr1G00000040;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01 maker gene 145015 147063 . + . ID=DcarChr1G00000050;Name=DcarChr1G00000050;Dbxref=InterPro:IPR006121,InterPro:IPR036163,PFAM:PF00403,PROSITE:PS50846,SUPERFAMILY:SSF55008;Note=Similar to NAKR2: Protein SODIUM POTASSIUM ROOT DEFECTIVE 2 (Arabidopsis thaliana)
chr01 exonerate gene 164172 286235 . + . ID=DcarChr1G00000060;Name=DcarChr1G00000060;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01 exonerate gene 395432 509234 . - . ID=DcarChr1G00000070;Name=DcarChr1G00000070;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01 exonerate gene 534035 534211 . - . ID=DcarChr1G00000080;Name=DcarChr1G00000080;Note=Protein of unknown function
chr01 maker gene 639615 642189 . + . ID=DcarChr1G00000090;Name=DcarChr1G00000090;Note=Protein of unknown function
chr01 transdecoder gene 655131 661114 . + . ID=DcarChr1G00000100;Name=DcarChr1G00000100;Dbxref=InterPro:IPR006115,InterPro:IPR008927,InterPro:IPR029154,InterPro:IPR036291,PFAM:PF03446,PFAM:PF14833,SUPERFAMILY:SSF48179,SUPERFAMILY:SSF51735;Note=Similar to GLYR1: Glyoxylate/succinic semialdehyde reductase 1 (Arabidopsis thaliana)
I am using one chromosome as a test (chr01) from a public source, a carrot reference genome. I can provide the input files if that helps to identify the discordance.
I have run the first gene sequence thru web InterProScan and here are the results:
DcarChr1G00000010.1 8ec93874987bfa42f413060a1245db03 193 PANTHER PTHR10231 NUCLEOTIDE-SUGAR TRANSMEMBRANE TRANSPORTER 61 152 5.8E-21 T 09-04-2024 IPR007271 Nucleotide-sugar transporter GO:0000139(InterPro)|GO:0015136(PANTHER)|GO:0015165(InterPro)|GO:0016020(InterPro)|GO:0030173(PANTHER)|GO:0090481(InterPro) Reactome:R-BTA-727802|Reactome:R-CEL-4085001|Reactome:R-CEL-727802|Reactome:R-CFA-727802|Reactome:R-HSA-4085001|Reactome:R-HSA-5619037|Reactome:R-HSA-5619072|Reactome:R-HSA-5619083|Reactome:R-HSA-5663020|Reactome:R-HSA-727802|Reactome:R-MMU-4085001|Reactome:R-MMU-727802|Reactome:R-RNO-727802|Reactome:R-SPO-727802
DcarChr1G00000010.1 8ec93874987bfa42f413060a1245db03 193 Phobius TRANSMEMBRANE Region of a membrane-bound protein predicted to be embedded in the membrane. 71 88 - T 09-04-2024 - - - -
DcarChr1G00000010.1 8ec93874987bfa42f413060a1245db03 193 Phobius TRANSMEMBRANE Region of a membrane-bound protein predicted to be embedded in the membrane. 108 128 - T 09-04-2024 - - - -
DcarChr1G00000010.1 8ec93874987bfa42f413060a1245db03 193 Pfam PF04142 Nucleotide-sugar transporter 61 153 2.6E-9 T 09-04-2024 IPR007271 Nucleotide-sugar transporter GO:0000139(InterPro)|GO:0015165(InterPro)|GO:0016020(InterPro)|GO:0090481(InterPro) Reactome:R-BTA-727802|Reactome:R-CEL-4085001|Reactome:R-CEL-727802|Reactome:R-CFA-727802|Reactome:R-HSA-4085001|Reactome:R-HSA-5619037|Reactome:R-HSA-5619072|Reactome:R-HSA-5619083|Reactome:R-HSA-5663020|Reactome:R-HSA-727802|Reactome:R-MMU-4085001|Reactome:R-MMU-727802|Reactome:R-RNO-727802|Reactome:R-SPO-727802
DcarChr1G00000010.1 8ec93874987bfa42f413060a1245db03 193 Phobius NON_CYTOPLASMIC_DOMAIN Region of a membrane-bound protein predicted to be outside the membrane, in the extracellular region. 89 107 - T 09-04-2024 - - - -
DcarChr1G00000010.1 8ec93874987bfa42f413060a1245db03 193 Phobius CYTOPLASMIC_DOMAIN Region of a membrane-bound protein predicted to be outside the membrane, in the cytoplasm. 129 193 - T 09-04-2024 - - - -
DcarChr1G00000010.1 8ec93874987bfa42f413060a1245db03 193 Phobius CYTOPLASMIC_DOMAIN Region of a membrane-bound protein predicted to be outside the membrane, in the cytoplasm. 1 70 - T 09-04-2024 - - - -
Sequence used:
>DcarChr1G00000010.1
MPMEECKAANHDEYFDGEIDGILTTLSQSDGSYKYDYATAPFLAEIFKVLNISRCPVSIDRLFLRRKLSN
LQWMAIFPLAIGTTTSQVKGCGEASCDSLFSSPISGYMLGVLSSCLSALAGIYTEFWLKKNNDDLYWKNV
QLYTCCIPSKTVLDFLLEEKTTKRLVFNQDTMPMEECKAANHDKYFDGEIDVA
Thank you for your cooperation.
Kind regards,
Emilie