Skip to content

Inconsistent InterProScan results between GAAS and manual run #110

@EmilieSmeets22

Description

@EmilieSmeets22

Hi,

I have different InterProScan results between running GAAS and running InterProScan manually, with the same input files.
I do not see any difference in the input arguments. As you know the backend very well maybe you could help me identifying what causes these differences.

What questions are:

  • Why can I not see the IPR, Pfam and GO codes/IDs in the merged GFF file?
  • Why genes are annotated differently between InterProScan (local install or web) and the install in GAAS?

Running InterProScan within GAAS:

[doutree@plop] $ module load Nextflow
[doutree@plop] $ cat ~/workspace/GFF/functional_annotation_param_chr01.yml
subworkflow: 'functional_annotation'
genome: '~/input/DAUCA_Kuroda_chr01.fa'
gff_annotation: '~/input/Daucus_carota.gene_chr_AGAT_chr01.gff'
blast_db_fasta: '~/input/uniprot_sprot.fasta'
outdir: '~/output/20240408_chr01'
[doutree@plop] $ cat ~/workspace/GFF/custom_config_chr01.txt
process {
    withName: 'INTERPROSCAN' {
        cpus     = 20
        memory   = 300.GB
        ext.args = [
            '--iprlookup',
            '--goterms',
            '-t p',
            '-dra',
            '-appl TIGRFAM,FunFam,SFLD,PANTHER,Gene3D,Hamap,Coils,SMART,CDD,PRINTS,PIRSR,AntiFam,Pfam'
        ].join(" ").trim()
    }
    withName: 'BLAST_BLASTP' {
        ext.args = '-max_target_seqs 1 -evalue 1e-6 -outfmt 6'
    }
}
[doutree@plop] $ nextflow run NBISweden/pipelines-nextflow -profile conda -params-file functional_annotation_param_chr01.yml -c custom_config_chr01.txt

N E X T F L O W  ~  version 22.10.1
Launching \`https://github.com/NBISweden/pipelines-nextflow\` [maniac_spence] DSL2 - revision: 5f66ae3cf2
[master]

         _  _ ___ ___ ___
        | \| | _ )_ _/ __|
        | .` | _ \| |\__ \
        |_|\_|___/___|___/ Annotation Service



        Functional annotation workflow
        ===================================================
[f9/834f23] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 1 of 1 ✔
[15/7856b3] process > FUNCTIONAL_ANNOTATION:GFF2P... [100%] 1 of 1 ✔
[3a/fbbf8e] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 6 of 6 ✔
[49/81bd45] process > FUNCTIONAL_ANNOTATION:INTER... [100%] 6 of 6 ✔
[b0/e757df] process > FUNCTIONAL_ANNOTATION:MERGE... [100%] 1 of 1 ✔

        Workflow completed successfully.

        Thank you for using our workflow.
        Results are located in the folder: ~/output/20240408_chr01

Completed at: 08-Apr-2024 16:57:29
Duration    : 10m 10s
CPU hours   : 5.9
Succeeded   : 15

[doutree@plop] $ head Daucus_carota.gene_chr_AGAT_chr01.gff
##gff-version 3
chr01   maker   gene    24795   31012   .       -       .       ID=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010
chr01   maker   mRNA    24795   31012   .       -       .       ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01   maker   exon    24795   24945   .       -       .       ID=NBISE00000000001;Parent=NBISM00000000001;makerName=nbis-exon-1
chr01   maker   exon    26435   26604   .       -       .       ID=NBISE00000000002;Parent=NBISM00000000001;makerName=nbis-exon-2
chr01   maker   exon    27851   27929   .       -       .       ID=NBISE00000000003;Parent=NBISM00000000001;makerName=nbis-exon-3
chr01   maker   exon    28302   28423   .       -       .       ID=NBISE00000000004;Parent=NBISM00000000001;makerName=nbis-exon-4
chr01   maker   exon    30953   31012   .       -       .       ID=NBISE00000000005;Parent=NBISM00000000001;makerName=nbis-exon-5
chr01   maker   CDS     24795   24945   .       -       1       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-5
chr01   maker   CDS     26435   26604   .       -       0       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-4
[doutree@plop] $ grep mRNA Daucus_carota.gene_chr_AGAT_chr01.gff | head
chr01   maker   mRNA    24795   31012   .       -       .       ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01   maker   mRNA    33922   37446   .       +       .       ID=NBISM00000000002;Parent=NBISG00000000002;makerName=DcarChr1G00000020.1;product=hypothetical protein
chr01   exonerate       mRNA    45536   50728   .       -       .       ID=NBISM00000000003;Parent=NBISG00000000003;Name=At5g41760;makerName=DcarChr1G00000030.1;product=CMP-sialic acid transporter 1;uniprot_id=Q8LGE9
chr01   exonerate       mRNA    90633   141688  .       -       .       ID=NBISM00000000004;Parent=NBISG00000000004;Name=GIP;makerName=DcarChr1G00000040.1;product=Copia protein;uniprot_id=P04146
chr01   maker   mRNA    145015  147063  .       +       .       ID=NBISM00000000005;Parent=NBISG00000000005;Name=NAKR2;makerName=DcarChr1G00000050.1;product=Protein SODIUM POTASSIUM ROOT DEFECTIVE 2;uniprot_id=Q58FZ0
chr01   exonerate       mRNA    164172  286235  .       +       .       ID=NBISM00000000006;Parent=NBISG00000000006;Name=GIP;makerName=DcarChr1G00000060.1;product=Copia protein;uniprot_id=P04146
chr01   exonerate       mRNA    395432  509234  .       -       .       ID=NBISM00000000007;Parent=NBISG00000000007;Name=GIP;makerName=DcarChr1G00000070.1;product=Copia protein;uniprot_id=P04146
chr01   exonerate       mRNA    534035  534211  .       -       .       ID=NBISM00000000008;Parent=NBISG00000000008;makerName=DcarChr1G00000080.1;product=hypothetical protein
chr01   maker   mRNA    639615  642189  .       +       .       ID=NBISM00000000009;Parent=NBISG00000000009;makerName=DcarChr1G00000090.1;product=hypothetical protein
chr01   transdecoder    mRNA    655131  661114  .       +       .       ID=NBISM00000000010;Parent=NBISG00000000010;Name=GLYR1;makerName=DcarChr1G00000100.1;product=Glyoxylate/succinic semialdehyde reductase 1;uniprot_id=Q9LSV0

Run InterProScan manually:

# Create protein FASTA sequence
[doutree@plop] ~ $ module load AGAT
[doutree@plop] ~ $ agat_sp_extract_sequences.pl -p -cfs -cis -ct 1 --g ~/input/Daucus_carota.gene_chr_AGAT_chr01.gff -f ~/input/DAUCA_Kuroda_chr01.fa -o ~/input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta
# Run InterProScan
[doutree@plop] ~ $ module load InterProScan
[doutree@plop] ~ $ interproscan.sh -version
InterProScan version 5.62-94.0
InterProScan 64-Bit build  (requires Java 11)
[doutree@plop] ~ $ interproscan.sh -i ~/input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta -f TSV -b ~/output/Daucus_carota.gene_chr_prot.fasta_interpro
# Merge annotation
[doutree@plop] ~ $ ipr_update_gff ~/input/Daucus_carota.gene_chr_AGAT_chr01.gff ~/output/Daucus_carota.gene_chr_prot.fasta_interpro.tsv > ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff
[doutree@plop] ~ $ head ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff
##gff-version 3
chr01   maker   mRNA    24795   31012   .       -       .       ID=DcarChr1G00000010_1;Parent=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010;Name=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01   maker   CDS     24795   24945   .       -       1       ID=cds-5;Parent=DcarChr1G00000010_1
chr01   maker   exon    24795   24945   .       -       .       ID=nbis-exon-1;Parent=DcarChr1G00000010_1
chr01   maker   exon    26435   26604   .       -       .       ID=nbis-exon-2;Parent=DcarChr1G00000010_1
chr01   maker   CDS     26435   26604   .       -       0       ID=cds-4;Parent=DcarChr1G00000010_1
chr01   maker   exon    27851   27929   .       -       .       ID=nbis-exon-3;Parent=DcarChr1G00000010_1
chr01   maker   CDS     27851   27929   .       -       1       ID=cds-3;Parent=DcarChr1G00000010_1
chr01   maker   exon    28302   28423   .       -       .       ID=nbis-exon-4;Parent=DcarChr1G00000010_1
# Please note that we store functional annotation at the gene level so a slight difference here
[doutree@plop] ~ $ grep gene ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff | head
chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010;Name=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01   maker   gene    33922   37446   .       +       .       ID=DcarChr1G00000020;Name=DcarChr1G00000020;Note=Protein of unknown function
chr01   exonerate       gene    45536   50728   .       -       .       ID=DcarChr1G00000030;Name=DcarChr1G00000030;Dbxref=InterPro:IPR007271,PFAM:PF04142,SUPERFAMILY:SSF103481,TIGRFAM:TIGR00803;Note=Similar to At5g41760: CMP-sialic acid transporter 1 (Arabidopsis thaliana)
chr01   exonerate       gene    90633   141688  .       -       .       ID=DcarChr1G00000040;Name=DcarChr1G00000040;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01   maker   gene    145015  147063  .       +       .       ID=DcarChr1G00000050;Name=DcarChr1G00000050;Dbxref=InterPro:IPR006121,InterPro:IPR036163,PFAM:PF00403,PROSITE:PS50846,SUPERFAMILY:SSF55008;Note=Similar to NAKR2: Protein SODIUM POTASSIUM ROOT DEFECTIVE 2 (Arabidopsis thaliana)
chr01   exonerate       gene    164172  286235  .       +       .       ID=DcarChr1G00000060;Name=DcarChr1G00000060;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01   exonerate       gene    395432  509234  .       -       .       ID=DcarChr1G00000070;Name=DcarChr1G00000070;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01   exonerate       gene    534035  534211  .       -       .       ID=DcarChr1G00000080;Name=DcarChr1G00000080;Note=Protein of unknown function
chr01   maker   gene    639615  642189  .       +       .       ID=DcarChr1G00000090;Name=DcarChr1G00000090;Note=Protein of unknown function
chr01   transdecoder    gene    655131  661114  .       +       .       ID=DcarChr1G00000100;Name=DcarChr1G00000100;Dbxref=InterPro:IPR006115,InterPro:IPR008927,InterPro:IPR029154,InterPro:IPR036291,PFAM:PF03446,PFAM:PF14833,SUPERFAMILY:SSF48179,SUPERFAMILY:SSF51735;Note=Similar to GLYR1: Glyoxylate/succinic semialdehyde reductase 1 (Arabidopsis thaliana)

I am using one chromosome as a test (chr01) from a public source, a carrot reference genome. I can provide the input files if that helps to identify the discordance.

I have run the first gene sequence thru web InterProScan and here are the results:

DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	PANTHER	PTHR10231	NUCLEOTIDE-SUGAR TRANSMEMBRANE TRANSPORTER	61	152	5.8E-21	T	09-04-2024	IPR007271	Nucleotide-sugar transporter	GO:0000139(InterPro)|GO:0015136(PANTHER)|GO:0015165(InterPro)|GO:0016020(InterPro)|GO:0030173(PANTHER)|GO:0090481(InterPro)	Reactome:R-BTA-727802|Reactome:R-CEL-4085001|Reactome:R-CEL-727802|Reactome:R-CFA-727802|Reactome:R-HSA-4085001|Reactome:R-HSA-5619037|Reactome:R-HSA-5619072|Reactome:R-HSA-5619083|Reactome:R-HSA-5663020|Reactome:R-HSA-727802|Reactome:R-MMU-4085001|Reactome:R-MMU-727802|Reactome:R-RNO-727802|Reactome:R-SPO-727802
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	TRANSMEMBRANE	Region of a membrane-bound protein predicted to be embedded in the membrane.	71	88	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	TRANSMEMBRANE	Region of a membrane-bound protein predicted to be embedded in the membrane.	108	128	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Pfam	PF04142	Nucleotide-sugar transporter	61	153	2.6E-9	T	09-04-2024	IPR007271	Nucleotide-sugar transporter	GO:0000139(InterPro)|GO:0015165(InterPro)|GO:0016020(InterPro)|GO:0090481(InterPro)	Reactome:R-BTA-727802|Reactome:R-CEL-4085001|Reactome:R-CEL-727802|Reactome:R-CFA-727802|Reactome:R-HSA-4085001|Reactome:R-HSA-5619037|Reactome:R-HSA-5619072|Reactome:R-HSA-5619083|Reactome:R-HSA-5663020|Reactome:R-HSA-727802|Reactome:R-MMU-4085001|Reactome:R-MMU-727802|Reactome:R-RNO-727802|Reactome:R-SPO-727802
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	NON_CYTOPLASMIC_DOMAIN	Region of a membrane-bound protein predicted to be outside the membrane, in the extracellular region.	89	107	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	CYTOPLASMIC_DOMAIN	Region of a membrane-bound protein predicted to be outside the membrane, in the cytoplasm.	129	193	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	CYTOPLASMIC_DOMAIN	Region of a membrane-bound protein predicted to be outside the membrane, in the cytoplasm.	1	70	-	T	09-04-2024	-	-	-	-

Sequence used:

>DcarChr1G00000010.1
MPMEECKAANHDEYFDGEIDGILTTLSQSDGSYKYDYATAPFLAEIFKVLNISRCPVSIDRLFLRRKLSN
LQWMAIFPLAIGTTTSQVKGCGEASCDSLFSSPISGYMLGVLSSCLSALAGIYTEFWLKKNNDDLYWKNV
QLYTCCIPSKTVLDFLLEEKTTKRLVFNQDTMPMEECKAANHDKYFDGEIDVA

Thank you for your cooperation.

Kind regards,
Emilie

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions