Skip to content

Inconsistent InterProScan results between GAAS and manual run #110

Open
@EmilieSmeets22

Description

@EmilieSmeets22

Hi,

I have different InterProScan results between running GAAS and running InterProScan manually, with the same input files.
I do not see any difference in the input arguments. As you know the backend very well maybe you could help me identifying what causes these differences.

What questions are:

  • Why can I not see the IPR, Pfam and GO codes/IDs in the merged GFF file?
  • Why genes are annotated differently between InterProScan (local install or web) and the install in GAAS?

Running InterProScan within GAAS:

[doutree@plop] $ module load Nextflow
[doutree@plop] $ cat ~/workspace/GFF/functional_annotation_param_chr01.yml
subworkflow: 'functional_annotation'
genome: '~/input/DAUCA_Kuroda_chr01.fa'
gff_annotation: '~/input/Daucus_carota.gene_chr_AGAT_chr01.gff'
blast_db_fasta: '~/input/uniprot_sprot.fasta'
outdir: '~/output/20240408_chr01'
[doutree@plop] $ cat ~/workspace/GFF/custom_config_chr01.txt
process {
    withName: 'INTERPROSCAN' {
        cpus     = 20
        memory   = 300.GB
        ext.args = [
            '--iprlookup',
            '--goterms',
            '-t p',
            '-dra',
            '-appl TIGRFAM,FunFam,SFLD,PANTHER,Gene3D,Hamap,Coils,SMART,CDD,PRINTS,PIRSR,AntiFam,Pfam'
        ].join(" ").trim()
    }
    withName: 'BLAST_BLASTP' {
        ext.args = '-max_target_seqs 1 -evalue 1e-6 -outfmt 6'
    }
}
[doutree@plop] $ nextflow run NBISweden/pipelines-nextflow -profile conda -params-file functional_annotation_param_chr01.yml -c custom_config_chr01.txt

N E X T F L O W  ~  version 22.10.1
Launching \`https://github.com/NBISweden/pipelines-nextflow\` [maniac_spence] DSL2 - revision: 5f66ae3cf2
[master]

         _  _ ___ ___ ___
        | \| | _ )_ _/ __|
        | .` | _ \| |\__ \
        |_|\_|___/___|___/ Annotation Service



        Functional annotation workflow
        ===================================================
[f9/834f23] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 1 of 1 ✔
[15/7856b3] process > FUNCTIONAL_ANNOTATION:GFF2P... [100%] 1 of 1 ✔
[3a/fbbf8e] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 6 of 6 ✔
[49/81bd45] process > FUNCTIONAL_ANNOTATION:INTER... [100%] 6 of 6 ✔
[b0/e757df] process > FUNCTIONAL_ANNOTATION:MERGE... [100%] 1 of 1 ✔

        Workflow completed successfully.

        Thank you for using our workflow.
        Results are located in the folder: ~/output/20240408_chr01

Completed at: 08-Apr-2024 16:57:29
Duration    : 10m 10s
CPU hours   : 5.9
Succeeded   : 15

[doutree@plop] $ head Daucus_carota.gene_chr_AGAT_chr01.gff
##gff-version 3
chr01   maker   gene    24795   31012   .       -       .       ID=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010
chr01   maker   mRNA    24795   31012   .       -       .       ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01   maker   exon    24795   24945   .       -       .       ID=NBISE00000000001;Parent=NBISM00000000001;makerName=nbis-exon-1
chr01   maker   exon    26435   26604   .       -       .       ID=NBISE00000000002;Parent=NBISM00000000001;makerName=nbis-exon-2
chr01   maker   exon    27851   27929   .       -       .       ID=NBISE00000000003;Parent=NBISM00000000001;makerName=nbis-exon-3
chr01   maker   exon    28302   28423   .       -       .       ID=NBISE00000000004;Parent=NBISM00000000001;makerName=nbis-exon-4
chr01   maker   exon    30953   31012   .       -       .       ID=NBISE00000000005;Parent=NBISM00000000001;makerName=nbis-exon-5
chr01   maker   CDS     24795   24945   .       -       1       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-5
chr01   maker   CDS     26435   26604   .       -       0       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-4
[doutree@plop] $ grep mRNA Daucus_carota.gene_chr_AGAT_chr01.gff | head
chr01   maker   mRNA    24795   31012   .       -       .       ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01   maker   mRNA    33922   37446   .       +       .       ID=NBISM00000000002;Parent=NBISG00000000002;makerName=DcarChr1G00000020.1;product=hypothetical protein
chr01   exonerate       mRNA    45536   50728   .       -       .       ID=NBISM00000000003;Parent=NBISG00000000003;Name=At5g41760;makerName=DcarChr1G00000030.1;product=CMP-sialic acid transporter 1;uniprot_id=Q8LGE9
chr01   exonerate       mRNA    90633   141688  .       -       .       ID=NBISM00000000004;Parent=NBISG00000000004;Name=GIP;makerName=DcarChr1G00000040.1;product=Copia protein;uniprot_id=P04146
chr01   maker   mRNA    145015  147063  .       +       .       ID=NBISM00000000005;Parent=NBISG00000000005;Name=NAKR2;makerName=DcarChr1G00000050.1;product=Protein SODIUM POTASSIUM ROOT DEFECTIVE 2;uniprot_id=Q58FZ0
chr01   exonerate       mRNA    164172  286235  .       +       .       ID=NBISM00000000006;Parent=NBISG00000000006;Name=GIP;makerName=DcarChr1G00000060.1;product=Copia protein;uniprot_id=P04146
chr01   exonerate       mRNA    395432  509234  .       -       .       ID=NBISM00000000007;Parent=NBISG00000000007;Name=GIP;makerName=DcarChr1G00000070.1;product=Copia protein;uniprot_id=P04146
chr01   exonerate       mRNA    534035  534211  .       -       .       ID=NBISM00000000008;Parent=NBISG00000000008;makerName=DcarChr1G00000080.1;product=hypothetical protein
chr01   maker   mRNA    639615  642189  .       +       .       ID=NBISM00000000009;Parent=NBISG00000000009;makerName=DcarChr1G00000090.1;product=hypothetical protein
chr01   transdecoder    mRNA    655131  661114  .       +       .       ID=NBISM00000000010;Parent=NBISG00000000010;Name=GLYR1;makerName=DcarChr1G00000100.1;product=Glyoxylate/succinic semialdehyde reductase 1;uniprot_id=Q9LSV0

Run InterProScan manually:

# Create protein FASTA sequence
[doutree@plop] ~ $ module load AGAT
[doutree@plop] ~ $ agat_sp_extract_sequences.pl -p -cfs -cis -ct 1 --g ~/input/Daucus_carota.gene_chr_AGAT_chr01.gff -f ~/input/DAUCA_Kuroda_chr01.fa -o ~/input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta
# Run InterProScan
[doutree@plop] ~ $ module load InterProScan
[doutree@plop] ~ $ interproscan.sh -version
InterProScan version 5.62-94.0
InterProScan 64-Bit build  (requires Java 11)
[doutree@plop] ~ $ interproscan.sh -i ~/input/Daucus_carota.gene_chr_AGAT_chr01_proteins.fasta -f TSV -b ~/output/Daucus_carota.gene_chr_prot.fasta_interpro
# Merge annotation
[doutree@plop] ~ $ ipr_update_gff ~/input/Daucus_carota.gene_chr_AGAT_chr01.gff ~/output/Daucus_carota.gene_chr_prot.fasta_interpro.tsv > ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff
[doutree@plop] ~ $ head ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff
##gff-version 3
chr01   maker   mRNA    24795   31012   .       -       .       ID=DcarChr1G00000010_1;Parent=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010;Name=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01   maker   CDS     24795   24945   .       -       1       ID=cds-5;Parent=DcarChr1G00000010_1
chr01   maker   exon    24795   24945   .       -       .       ID=nbis-exon-1;Parent=DcarChr1G00000010_1
chr01   maker   exon    26435   26604   .       -       .       ID=nbis-exon-2;Parent=DcarChr1G00000010_1
chr01   maker   CDS     26435   26604   .       -       0       ID=cds-4;Parent=DcarChr1G00000010_1
chr01   maker   exon    27851   27929   .       -       .       ID=nbis-exon-3;Parent=DcarChr1G00000010_1
chr01   maker   CDS     27851   27929   .       -       1       ID=cds-3;Parent=DcarChr1G00000010_1
chr01   maker   exon    28302   28423   .       -       .       ID=nbis-exon-4;Parent=DcarChr1G00000010_1
# Please note that we store functional annotation at the gene level so a slight difference here
[doutree@plop] ~ $ grep gene ~/output/Daucus_carota.gene_chr_AGAT_chr01_IPS.gff | head
chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010;Name=DcarChr1G00000010;Dbxref=InterPro:IPR007271,PFAM:PF04142;Note=Similar to CSTLP1: CMP-sialic acid transporter 1 (Oryza sativa subsp. japonica)
chr01   maker   gene    33922   37446   .       +       .       ID=DcarChr1G00000020;Name=DcarChr1G00000020;Note=Protein of unknown function
chr01   exonerate       gene    45536   50728   .       -       .       ID=DcarChr1G00000030;Name=DcarChr1G00000030;Dbxref=InterPro:IPR007271,PFAM:PF04142,SUPERFAMILY:SSF103481,TIGRFAM:TIGR00803;Note=Similar to At5g41760: CMP-sialic acid transporter 1 (Arabidopsis thaliana)
chr01   exonerate       gene    90633   141688  .       -       .       ID=DcarChr1G00000040;Name=DcarChr1G00000040;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01   maker   gene    145015  147063  .       +       .       ID=DcarChr1G00000050;Name=DcarChr1G00000050;Dbxref=InterPro:IPR006121,InterPro:IPR036163,PFAM:PF00403,PROSITE:PS50846,SUPERFAMILY:SSF55008;Note=Similar to NAKR2: Protein SODIUM POTASSIUM ROOT DEFECTIVE 2 (Arabidopsis thaliana)
chr01   exonerate       gene    164172  286235  .       +       .       ID=DcarChr1G00000060;Name=DcarChr1G00000060;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01   exonerate       gene    395432  509234  .       -       .       ID=DcarChr1G00000070;Name=DcarChr1G00000070;Dbxref=InterPro:IPR001584,InterPro:IPR012337,InterPro:IPR013103,InterPro:IPR025724,PFAM:PF00665,PFAM:PF07727,PFAM:PF13976,PROSITE:PS50994,SUPERFAMILY:SSF53098,SUPERFAMILY:SSF56672;Note=Similar to GIP: Copia protein (Drosophila melanogaster)
chr01   exonerate       gene    534035  534211  .       -       .       ID=DcarChr1G00000080;Name=DcarChr1G00000080;Note=Protein of unknown function
chr01   maker   gene    639615  642189  .       +       .       ID=DcarChr1G00000090;Name=DcarChr1G00000090;Note=Protein of unknown function
chr01   transdecoder    gene    655131  661114  .       +       .       ID=DcarChr1G00000100;Name=DcarChr1G00000100;Dbxref=InterPro:IPR006115,InterPro:IPR008927,InterPro:IPR029154,InterPro:IPR036291,PFAM:PF03446,PFAM:PF14833,SUPERFAMILY:SSF48179,SUPERFAMILY:SSF51735;Note=Similar to GLYR1: Glyoxylate/succinic semialdehyde reductase 1 (Arabidopsis thaliana)

I am using one chromosome as a test (chr01) from a public source, a carrot reference genome. I can provide the input files if that helps to identify the discordance.

I have run the first gene sequence thru web InterProScan and here are the results:

DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	PANTHER	PTHR10231	NUCLEOTIDE-SUGAR TRANSMEMBRANE TRANSPORTER	61	152	5.8E-21	T	09-04-2024	IPR007271	Nucleotide-sugar transporter	GO:0000139(InterPro)|GO:0015136(PANTHER)|GO:0015165(InterPro)|GO:0016020(InterPro)|GO:0030173(PANTHER)|GO:0090481(InterPro)	Reactome:R-BTA-727802|Reactome:R-CEL-4085001|Reactome:R-CEL-727802|Reactome:R-CFA-727802|Reactome:R-HSA-4085001|Reactome:R-HSA-5619037|Reactome:R-HSA-5619072|Reactome:R-HSA-5619083|Reactome:R-HSA-5663020|Reactome:R-HSA-727802|Reactome:R-MMU-4085001|Reactome:R-MMU-727802|Reactome:R-RNO-727802|Reactome:R-SPO-727802
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	TRANSMEMBRANE	Region of a membrane-bound protein predicted to be embedded in the membrane.	71	88	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	TRANSMEMBRANE	Region of a membrane-bound protein predicted to be embedded in the membrane.	108	128	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Pfam	PF04142	Nucleotide-sugar transporter	61	153	2.6E-9	T	09-04-2024	IPR007271	Nucleotide-sugar transporter	GO:0000139(InterPro)|GO:0015165(InterPro)|GO:0016020(InterPro)|GO:0090481(InterPro)	Reactome:R-BTA-727802|Reactome:R-CEL-4085001|Reactome:R-CEL-727802|Reactome:R-CFA-727802|Reactome:R-HSA-4085001|Reactome:R-HSA-5619037|Reactome:R-HSA-5619072|Reactome:R-HSA-5619083|Reactome:R-HSA-5663020|Reactome:R-HSA-727802|Reactome:R-MMU-4085001|Reactome:R-MMU-727802|Reactome:R-RNO-727802|Reactome:R-SPO-727802
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	NON_CYTOPLASMIC_DOMAIN	Region of a membrane-bound protein predicted to be outside the membrane, in the extracellular region.	89	107	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	CYTOPLASMIC_DOMAIN	Region of a membrane-bound protein predicted to be outside the membrane, in the cytoplasm.	129	193	-	T	09-04-2024	-	-	-	-
DcarChr1G00000010.1	8ec93874987bfa42f413060a1245db03	193	Phobius	CYTOPLASMIC_DOMAIN	Region of a membrane-bound protein predicted to be outside the membrane, in the cytoplasm.	1	70	-	T	09-04-2024	-	-	-	-

Sequence used:

>DcarChr1G00000010.1
MPMEECKAANHDEYFDGEIDGILTTLSQSDGSYKYDYATAPFLAEIFKVLNISRCPVSIDRLFLRRKLSN
LQWMAIFPLAIGTTTSQVKGCGEASCDSLFSSPISGYMLGVLSSCLSALAGIYTEFWLKKNNDDLYWKNV
QLYTCCIPSKTVLDFLLEEKTTKRLVFNQDTMPMEECKAANHDKYFDGEIDVA

Thank you for your cooperation.

Kind regards,
Emilie

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions