-
Notifications
You must be signed in to change notification settings - Fork 88
Description
Dear Sean and Nick,
I am running Jackhmmer 3.4 with the query IQNNERETLVNSIKTAIQYSKIQAIHLGHPIYLLPFGSNENWSRGMVLAKLNQTTNKTELIHQWQWSSNSWNINWKGVDSNHRIIISNIPNRAMSNGKFILNNKRTNEKVVVTLNRLGRVRVGGN against the BFD first_non_consensus_sequences database from AlphaFold 3:
jackhmmer \
-N 1 \
-A output.a3m \
--tblout tblout.txt \
--noali \
-E 0.0001 \
--incE 0.0001 \
--incdomE 0.01 \
--F1 0.0005 \
--F2 5e-05 \
--F3 5e-07 \
-Z 65984053 \
query.fasta bfd-first_non_consensus_sequences.fastaI am setting --incdomE to 0.01, yet I am seeing SRR5665647_2208059 and ERR1700754_51786 included in the output a3m even though their domain E-values are 8.2e+03 (8,200) and 1.2e+04 (12,000), respectively. I thought that the condition for including a hit in the MSA is that the whole-sequence E-value is <= E-value threshold AND the domain E-value is <= domain E-value threshold.
Interestingly:
- If I don't set
--incdomEand it is set to the default 0.001, domains from these two hits are not included in the MSA. - If I set
--domZ=65984053to match the-Zflag and the number of sequences in the database, domains from these two hits are not included in the MSA. Note however, that the reported domain E-values don't change.
Why is this? Is this a bug in Jackhmmer, or is it my mis-understanding of domain E-values or how they are filtered?
# --- full sequence ---- --- best 1 domain ---- --- domain number estimation ----
# target name accession query name accession E-value score bias E-value score bias exp reg clu ov env dom rep inc description of target
#------------------- ---------- -------------------- ---------- --------- ------ ----- --------- ------ ----- --- --- --- --- --- --- --- --- ---------------------
A0A140J420_LEGPN - query - 2.7e-61 216.4 7.4 3.3e-61 216.2 7.4 1.1 1 0 0 1 1 1 1 -
SRR5665647_2208059 - query - 6.6e-20 82.8 146.6 8.2e+03 8.2 0.4 25.0 6 4 18 25 25 25 25 -
SRR5690554_1041284 - query - 1.6e-12 59.0 0.0 3.9e+05 2.8 0.0 13.4 14 0 0 14 14 14 0 -
ERR1700754_51786 - query - 3.8e-12 57.7 46.9 1.2e+04 7.6 0.1 13.0 12 1 1 13 13 13 11 -
SRR3990167_11214105 - query - 5.5e-12 57.2 0.1 7.2e-12 56.8 0.1 1.1 1 0 0 1 1 1 1 -
SRR6201984_2660209 - query - 1.3e-11 56.0 122.7 1.6e+05 4.0 0.2 34.1 1 1 46 49 49 49 0 -
A0A0W0RE50_9GAMM - query - 3e-11 54.9 0.0 3.7e-11 54.5 0.0 1.1 1 0 0 1 1 1 1 -
SRR3990167_4442378 - query - 4.2e-11 54.4 0.5 5.3e-11 54.0 0.5 1.1 1 0 0 1 1 1 1 -
A0A0W0XTZ5_9GAMM - query - 7e-10 50.4 0.0 8.9e-10 50.1 0.0 1.1 1 0 0 1 1 1 1 -
SRR3990167_9143593 - query - 9.8e-09 46.7 0.8 1.4e-08 46.2 0.8 1.1 1 0 0 1 1 1 1 -
SRR6266487_2122007 - query - 9e-07 40.4 10.2 1.6e+05 4.1 0.0 9.8 10 0 0 10 10 10 0 -
#
# Program: jackhmmer
# Version: 3.4 (Aug 2023)
# Pipeline mode: SEARCH
# Query file: query.fasta
# Target file: bfd-first_non_consensus_sequences.fasta
# Option settings: jackhmmer -N 1 -A output.a3m --tblout tblout.txt --noali -E 0.0001 --incE 0.0001 --incdomE 0.01 --F1 0.0005 --F2 5e-05 --F3 5e-07 -Z 65984053 --cpu 12 query.fasta bfd-first_non_consensus_sequences.fasta
# Current dir: .
# Date: Tue Aug 12 16:22:59 2025
# [ok]>query
IQNNERETLVNSIKTAIQYSKIQAIHLGHPIYLLPFGSNENWSRGMVLAKLNQTTNKTELIHQWQWSSNSWNINWKGVDSNHRIIISNIPNRAMSNGKFILNNKRTNEKVVVTLNRLGRVRVGGN
>A0A140J420_LEGPN/38-161 [subseq from] A0A140J420_LEGPN
IQKNERETLINNIKTAVQYSKIEAIHFGHPIYLIPLGSNENWSKGMVLAQFDQKSNKIELIHQWHWSSNSWNINWRGVDSTNRIIISNTPYRAMSNGKFILDNRRTNERVEVTLNRLGRVKVGN-
>SRR5665647_2208059/35-62 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/85-112 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/122-161 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSI--------------------------------------------
>SRR5665647_2208059/185-212 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/222-262 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/284-312 [subseq from] SRR5665647_2208059
-----------------------------------------------------GDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/322-362 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/383-412 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/435-462 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/483-512 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/535-562 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/585-612 [subseq from] SRR5665647_2208059
------------------------------------------------------DGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/633-662 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/683-712 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/722-762 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/782-812 [subseq from] SRR5665647_2208059
---------------------------------------------------ANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/822-862 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/882-912 [subseq from] SRR5665647_2208059
---------------------------------------------------ANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/923-962 [subseq from] SRR5665647_2208059
------------------------------------------TDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/983-1012 [subseq from] SRR5665647_2208059
----------------------------------------------------NGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/1022-1062 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/1072-1112 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/1122-1161 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSI--------------------------------------------
>SRR5665647_2208059/1172-1212 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSIH-------------------------------------------
>SRR5665647_2208059/1222-1261 [subseq from] SRR5665647_2208059
-----------------------------------------YTDGVWLAADANGDGKTDLVHRWSGGVNTWLSNGDGGYSI--------------------------------------------
>ERR1700754_51786/59-82 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/109-132 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/158-182 [subseq from] ERR1700754_51786
----------------------------------------------------NGDGKTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/197-232 [subseq from] ERR1700754_51786
-----------------------------------------YAQGMWFAADVNGDGKSDLIHRWDLGVNTWISSGDG------------------------------------------------
>ERR1700754_51786/309-332 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/359-382 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/409-432 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/459-482 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/509-532 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGRTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/559-582 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGKTDLIHRWDLGVNTWISNGDG------------------------------------------------
>ERR1700754_51786/609-632 [subseq from] ERR1700754_51786
-----------------------------------------------------GDGKTDLIHRWDLGVNTWLSNGDG------------------------------------------------
>SRR3990167_11214105/58-177 [subseq from] SRR3990167_11214105
----EHKRCINALKSLLQYARMQAFLRGETLVLAPQNNDKNWSHGVYLFVQeGRTLppkNKEE-LYVWHWQHSGIQVSWHGFQSNDYLIIDAQLSRLALNGYFLIDDGVSNPEK-ITVSRFGQMDV---
>A0A0W0RE50_9GAMM/52-161 [subseq from] A0A0W0RE50_9GAMM
------------LILALHFARNQALLSGKPLALRAEPDSGDWSKGMVLFFDNASHQfETNLLqHQWHWNCRNIAIKWHGFQSSQYLVFAATPMQAVASGRFELSSET--QGIDVIINRLGRIRD---
>SRR3990167_4442378/54-174 [subseq from] SRR3990167_4442378
---NKIDILVSQVINSIHYSRNMALISGQDVTLNPIGASGDWSAGMILFVDNPTHHYTKLdkfIYNWQWQQSSqLKLVWRGFKSTEYLTFAKTLRRSTVNGHFVILQDGVEVRRI-VVNRLGRIK----
>A0A0W0XTZ5_9GAMM/55-166 [subseq from] A0A0W0XTZ5_9GAMM
---------QNQLIQALHFARNQAFLSGKPMILQADPASDDWTRGMVLLTDTpDHRYETSLLqHQWSWNCRNVLIKWQGFLSDKFLVFAANPTQAASSGRFRLFAGDSYSDVI--INRLGRIR----
>SRR3990167_9143593/99-210 [subseq from] SRR3990167_9143593
--------VTDDIKTAIKLAKVESEVRKERLVLSSIDEN-DWSHGMRLYSIDSKGIVNEIIKEWYWQKYDIAVVWSGFQSDKSLVFMPEINKSTINGKFVISSPSFNITN-VIINRLGRVMV---
Thanks,
Augustin