Skip to content

Filtering by domain E-value in Jackhmmer #340

@Augustin-Zidek

Description

@Augustin-Zidek

Dear Sean and Nick,

I am running Jackhmmer 3.4 with the query IQNNERETLVNSIKTAIQYSKIQAIHLGHPIYLLPFGSNENWSRGMVLAKLNQTTNKTELIHQWQWSSNSWNINWKGVDSNHRIIISNIPNRAMSNGKFILNNKRTNEKVVVTLNRLGRVRVGGN against the BFD first_non_consensus_sequences database from AlphaFold 3:

jackhmmer \
  -N 1 \
  -A output.a3m \
  --tblout tblout.txt \
  --noali \
  -E 0.0001 \
  --incE 0.0001 \
  --incdomE 0.01 \
  --F1 0.0005 \
  --F2 5e-05 \
  --F3 5e-07 \
  -Z 65984053 \
  query.fasta bfd-first_non_consensus_sequences.fasta

I am setting --incdomE to 0.01, yet I am seeing SRR5665647_2208059 and ERR1700754_51786 included in the output a3m even though their domain E-values are 8.2e+03 (8,200) and 1.2e+04 (12,000), respectively. I thought that the condition for including a hit in the MSA is that the whole-sequence E-value is <= E-value threshold AND the domain E-value is <= domain E-value threshold.

Interestingly:

  1. If I don't set --incdomE and it is set to the default 0.001, domains from these two hits are not included in the MSA.
  2. If I set --domZ=65984053 to match the -Z flag and the number of sequences in the database, domains from these two hits are not included in the MSA. Note however, that the reported domain E-values don't change.

Why is this? Is this a bug in Jackhmmer, or is it my mis-understanding of domain E-values or how they are filtered?

#                                                               --- full sequence ---- --- best 1 domain ---- --- domain number estimation ----
# target name        accession  query name           accession    E-value  score  bias   E-value  score  bias   exp reg clu  ov env dom rep inc description of target
#------------------- ---------- -------------------- ---------- --------- ------ ----- --------- ------ -----   --- --- --- --- --- --- --- --- ---------------------
A0A140J420_LEGPN     -          query                -            2.7e-61  216.4   7.4   3.3e-61  216.2   7.4   1.1   1   0   0   1   1   1   1 -
SRR5665647_2208059   -          query                -            6.6e-20   82.8 146.6   8.2e+03    8.2   0.4  25.0   6   4  18  25  25  25  25 -
SRR5690554_1041284   -          query                -            1.6e-12   59.0   0.0   3.9e+05    2.8   0.0  13.4  14   0   0  14  14  14   0 -
ERR1700754_51786     -          query                -            3.8e-12   57.7  46.9   1.2e+04    7.6   0.1  13.0  12   1   1  13  13  13  11 -
SRR3990167_11214105  -          query                -            5.5e-12   57.2   0.1   7.2e-12   56.8   0.1   1.1   1   0   0   1   1   1   1 -
SRR6201984_2660209   -          query                -            1.3e-11   56.0 122.7   1.6e+05    4.0   0.2  34.1   1   1  46  49  49  49   0 -
A0A0W0RE50_9GAMM     -          query                -              3e-11   54.9   0.0   3.7e-11   54.5   0.0   1.1   1   0   0   1   1   1   1 -
SRR3990167_4442378   -          query                -            4.2e-11   54.4   0.5   5.3e-11   54.0   0.5   1.1   1   0   0   1   1   1   1 -
A0A0W0XTZ5_9GAMM     -          query                -              7e-10   50.4   0.0   8.9e-10   50.1   0.0   1.1   1   0   0   1   1   1   1 -
SRR3990167_9143593   -          query                -            9.8e-09   46.7   0.8   1.4e-08   46.2   0.8   1.1   1   0   0   1   1   1   1 -
SRR6266487_2122007   -          query                -              9e-07   40.4  10.2   1.6e+05    4.1   0.0   9.8  10   0   0  10  10  10   0 -
#
# Program:         jackhmmer
# Version:         3.4 (Aug 2023)
# Pipeline mode:   SEARCH
# Query file:      query.fasta
# Target file:     bfd-first_non_consensus_sequences.fasta
# Option settings: jackhmmer -N 1 -A output.a3m --tblout tblout.txt --noali -E 0.0001 --incE 0.0001 --incdomE 0.01 --F1 0.0005 --F2 5e-05 --F3 5e-07 -Z 65984053 --cpu 12 query.fasta bfd-first_non_consensus_sequences.fasta 
# Current dir:     .
# Date:            Tue Aug 12 16:22:59 2025
# [ok]
>query
IQNNERETLVNSIKTAIQYSKIQAIHLGHPIYLLPFGSNENWSRGMVLAKLNQTTNKTELIHQWQWSSNSWNINWKGVDSNHRIIISNIPNRAMSNGKFILNNKRTNEKVVVTLNRLGRVRVGGN
>A0A140J420_LEGPN/38-161 [subseq from] A0A140J420_LEGPN
IQKNERETLINNIKTAVQYSKIEAIHFGHPIYLIPLGSNENWSKGMVLAQFDQKSNKIELIHQWHWSSNSWNINWRGVDSTNRIIISNTPYRAMSNGKFILDNRRTNERVEVTLNRLGRVKVGN-
>SRR5665647_2208059/35-62 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/85-112 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/122-161 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSI--------------------------------------------
>SRR5665647_2208059/185-212 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/222-262 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/284-312 [subseq from] SRR5665647_2208059
-----------------------------------------------------GDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/322-362 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/383-412 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/435-462 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/483-512 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/535-562 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/585-612 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/633-662 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/683-712 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/722-762 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/782-812 [subseq from] SRR5665647_2208059
---------------------------------------------------ANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/822-862 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/882-912 [subseq from] SRR5665647_2208059
---------------------------------------------------ANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/923-962 [subseq from] SRR5665647_2208059
------------------------------------------TDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/983-1012 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/1022-1062 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/1072-1112 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/1122-1161 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSI--------------------------------------------
>SRR5665647_2208059/1172-1212 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/1222-1261 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSI--------------------------------------------
>ERR1700754_51786/59-82 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/109-132 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/158-182 [subseq from] ERR1700754_51786
----------------------------------------------------NGDGKTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/197-232 [subseq from] ERR1700754_51786
-----------------------------------------YAQGMWFAADVNGDGKSDLIHRWDLGVNTWISSGDG------------------------------------------------
>ERR1700754_51786/309-332 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/359-382 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/409-432 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/459-482 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/509-532 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/559-582 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGKTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/609-632 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGKTDLIHRWDLGVNTWLSNGDG------------------------------------------------
>SRR3990167_11214105/58-177 [subseq from] SRR3990167_11214105
----EHKRCINALKSLLQYARMQAFLRGETLVLAPQNNDKNWSHGVYLFVQeGRTLppkNKEE-LYVWHWQHSGIQVSWHGFQSNDYLIIDAQLSRLALNGYFLIDDGVSNPEK-ITVSRFGQMDV---
>A0A0W0RE50_9GAMM/52-161 [subseq from] A0A0W0RE50_9GAMM
------------LILALHFARNQALLSGKPLALRAEPDSGDWSKGMVLFFDNASHQfETNLLqHQWHWNCRNIAIKWHGFQSSQYLVFAATPMQAVASGRFELSSET--QGIDVIINRLGRIRD---
>SRR3990167_4442378/54-174 [subseq from] SRR3990167_4442378
---NKIDILVSQVINSIHYSRNMALISGQDVTLNPIGASGDWSAGMILFVDNPTHHYTKLdkfIYNWQWQQSSqLKLVWRGFKSTEYLTFAKTLRRSTVNGHFVILQDGVEVRRI-VVNRLGRIK----
>A0A0W0XTZ5_9GAMM/55-166 [subseq from] A0A0W0XTZ5_9GAMM
---------QNQLIQALHFARNQAFLSGKPMILQADPASDDWTRGMVLLTDTpDHRYETSLLqHQWSWNCRNVLIKWQGFLSDKFLVFAANPTQAASSGRFRLFAGDSYSDVI--INRLGRIR----
>SRR3990167_9143593/99-210 [subseq from] SRR3990167_9143593
--------VTDDIKTAIKLAKVESEVRKERLVLSSIDEN-DWSHGMRLYSIDSKGIVNEIIKEWYWQKYDIAVVWSGFQSDKSLVFMPEINKSTINGKFVISSPSFNITN-VIINRLGRVMV---

Thanks,
Augustin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions