Skip to content

Conversation

@diegozea
Copy link
Owner

Summary

  • add a PSSMResult container and pssm API to compute log-odds PSSMs from MSAs
  • validate background distributions, gap handling, and zero-handling policies while using MIToS probability estimators
  • cover the new functionality with targeted tests for shapes, correctness, zero policies, and all-gap columns

Testing

  • julia --project -e 'using Test; using MIToS; using MIToS.MSA; using MIToS.Information; include("test/Information/PSSM.jl")'

Codex Task

@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 88.26291% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.60%. Comparing base (2cdce44) to head (f2ddaac).

Files with missing lines Patch % Lines
src/Information/PSSM.jl 88.20% 25 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #193      +/-   ##
==========================================
- Coverage   96.97%   96.60%   -0.37%     
==========================================
  Files          64       65       +1     
  Lines        4861     5073     +212     
==========================================
+ Hits         4714     4901     +187     
- Misses        147      172      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 18, 2025

Benchmark Results (Julia v1)

Time benchmarks
master f2ddaac... master / f2ddaac...
Information/CorrectedMutualInformation/buslje09/msa 0.891 ± 0.0078 s 0.89 ± 0.0085 s 1 ± 0.013
Information/CorrectedMutualInformation/buslje09/msa_large 0.0378 ± 0.0011 s 0.0377 ± 0.0011 s 1 ± 0.041
Information/CorrectedMutualInformation/buslje09/msa_wide 0.805 ± 0.0066 s 0.798 ± 0.0089 s 1.01 ± 0.014
Information/MIp/PF09645 9.94 ± 0.37 ms 9.82 ± 0.34 ms 1.01 ± 0.051
Information/frequencies!/1 0.29 ± 0.039 μs 0.29 ± 0.04 μs 1 ± 0.19
Information/frequencies!/2 1.56 ± 0.06 μs 1.56 ± 0.05 μs 1 ± 0.05
Information/highlevel/BLMI 0.0629 ± 0.0012 s 0.0644 ± 0.00099 s 0.976 ± 0.024
Information/highlevel/buslje09 12.3 ± 0.27 ms 11.9 ± 0.29 ms 1.03 ± 0.034
Information/shannon_entropy/PF09645 23.1 ± 1.5 μs 22 ± 2 μs 1.05 ± 0.12
MSA/Annotations/filtercolumns/boolean mask 12.6 ± 1.1 μs 12.5 ± 0.82 μs 1 ± 0.11
MSA/Annotations/filtercolumns/index array 4.73 ± 0.98 μs 5.05 ± 0.49 μs 0.936 ± 0.21
MSA/Base.vcat/annotated 7.58 ± 3.4 μs 8.33 ± 1.3 μs 0.911 ± 0.43
MSA/Base.vcat/unannotated 3.04 ± 0.84 μs 2.88 ± 0.69 μs 1.06 ± 0.39
MSA/Residue conversions/char2res 0.506 ± 0.078 ms 0.504 ± 0.92 ms 1 ± 1.8
MSA/Residue conversions/int2res 0.424 ± 0.1 ms 0.413 ± 0.13 ms 1.02 ± 0.41
MSA/Residue conversions/res2char 0.304 ± 0.032 ms 0.298 ± 0.044 ms 1.02 ± 0.18
MSA/Residue conversions/res2int 0.492 ± 0.19 ms 0.461 ± 0.22 ms 1.07 ± 0.65
MSA/hobohmI/pid20 1.19 ± 0.61 μs 1.27 ± 0.55 μs 0.937 ± 0.63
MSA/hobohmI/pid62 1.14 ± 0.72 μs 1.32 ± 0.51 μs 0.864 ± 0.64
MSA/hobohmI/pid80 1.06 ± 0.62 μs 1.43 ± 0.57 μs 0.738 ± 0.52
MSA/hobohmI/pid99 1.52 ± 0.57 μs 1.49 ± 0.65 μs 1.02 ± 0.59
MSA/identity/matrix_Float64 22 ± 4.4 μs 22.3 ± 2 μs 0.983 ± 0.22
MSA/identity/mean 0.127 ± 0.013 ms 0.14 ± 0.012 ms 0.907 ± 0.12
MSA/read/Clustal 0.0365 ± 0.0069 ms 0.0344 ± 0.0066 ms 1.06 ± 0.29
MSA/read/Clustal_num 0.0362 ± 0.0066 ms 0.036 ± 0.0068 ms 1.01 ± 0.26
MSA/read/FASTA 0.0577 ± 0.0099 ms 0.0588 ± 0.0096 ms 0.98 ± 0.23
MSA/read/FASTA.gz 0.057 ± 0.0072 ms 0.0573 ± 0.0075 ms 0.995 ± 0.18
MSA/read/FASTA.gz_annotated 0.0664 ± 0.0076 ms 0.0656 ± 0.007 ms 1.01 ± 0.16
MSA/read/FASTA_deletefullgaps 7.64 ± 2.8 ms 7.27 ± 3.2 ms 1.05 ± 0.6
MSA/read/FASTA_deletefullgaps_mapping 0.107 ± 0.016 s 0.105 ± 0.012 s 1.01 ± 0.19
MSA/read/Stockholm 0.0431 ± 0.009 ms 0.0425 ± 0.0091 ms 1.01 ± 0.3
MSA/read/Stockholm.gz 0.0721 ± 0.011 ms 0.0712 ± 0.011 ms 1.01 ± 0.22
MSA/read/Stockholm_annotated 0.0567 ± 0.012 ms 0.0585 ± 0.011 ms 0.969 ± 0.28
MSA/read/Stockholm_mapping 0.215 ± 0.047 ms 0.216 ± 0.044 ms 0.996 ± 0.3
MSA/read/Stockholm_mapping_coords 0.137 ± 0.034 ms 0.136 ± 0.033 ms 1.01 ± 0.35
MSA/write/FASTA 0.248 ± 0.045 ms 0.234 ± 0.043 ms 1.06 ± 0.27
PDB/_generate_interaction_keys/defaults 0.0528 ± 0.015 ms 0.0532 ± 0.017 ms 0.992 ± 0.42
PDB/_get_matched_Cαs/hemoglobin 0.0455 ± 0.011 ms 0.0505 ± 0.0088 ms 0.901 ± 0.27
PDB/_pdbresidues_to_mmcifdict/2vqc 0.668 ± 0.071 ms 0.709 ± 0.081 ms 0.943 ± 0.15
PDB/contact/1CBN_20_30_CB 0.401 ± 0.21 μs 0.451 ± 0.17 μs 0.889 ± 0.57
PDB/contact/1CBN_20_30_heavy 0.511 ± 0.19 μs 0.531 ± 0.19 μs 0.962 ± 0.5
PDB/count_alanine/1CBN 0.341 ± 0.069 μs 0.341 ± 0.01 μs 1 ± 0.2
PDB/distance/1CBN_20_30 0.14 ± 0.01 μs 0.14 ± 0.02 μs 1 ± 0.16
PDB/read/MMCIFFile 3.25 ± 0.26 ms 3.27 ± 0.23 ms 0.995 ± 0.11
PDB/squared_distance/1CBN_20_30_CB 0.391 ± 0.26 μs 0.511 ± 0.17 μs 0.765 ± 0.57
PDB/squared_distance/1CBN_20_30_heavy 0.381 ± 0.18 μs 0.521 ± 0.21 μs 0.731 ± 0.46
Pfam/accession mapping/acc2seqnames 0.218 ± 0.02 ms 0.213 ± 0.018 ms 1.02 ± 0.13
SIFTS/ResidueDetails/_get_details 2.94 ± 1.9 μs 2.23 ± 0.7 μs 1.32 ± 0.96
SIFTS/ResidueDetails/_is_missing 2.54 ± 0.97 μs 2.71 ± 1.2 μs 0.934 ± 0.54
SIFTS/SIFTSResidue/18gs 0.12 ± 0.15 μs 0.111 ± 0.15 μs 1.08 ± 2
SIFTS/siftsmapping/2vqc 2.57 ± 0.14 ms 2.51 ± 0.12 ms 1.02 ± 0.076
Utils/get_n_words/ascii 0.141 ± 0.1 μs 0.3 ± 0.18 μs 0.47 ± 0.44
Utils/get_n_words/utf8 0.131 ± 0.12 μs 0.241 ± 0.17 μs 0.544 ± 0.63
Utils/hascoordinates/invalid 0.09 ± 0.01 μs 0.081 ± 0.01 μs 1.11 ± 0.18
Utils/hascoordinates/valid 0.13 ± 0.01 μs 0.131 ± 0.01 μs 0.992 ± 0.11
Utils/list2matrix/upper 0.341 ± 0.044 ms 0.293 ± 0.16 ms 1.17 ± 0.65
Utils/list2matrix/upper_diagonal 0.512 ± 0.13 ms 0.482 ± 0.19 ms 1.06 ± 0.49
Utils/matrix2list/upper 0.145 ± 0.081 ms 0.14 ± 0.066 ms 1.04 ± 0.76
Utils/matrix2list/upper_diagonal 0.162 ± 0.065 ms 0.151 ± 0.071 ms 1.07 ± 0.66
time_to_load 0.953 ± 0.015 s 0.934 ± 0.01 s 1.02 ± 0.019
Memory benchmarks
master f2ddaac... master / f2ddaac...
Information/CorrectedMutualInformation/buslje09/msa 0.766 M allocs: 0.032 GB 0.766 M allocs: 0.032 GB 1
Information/CorrectedMutualInformation/buslje09/msa_large 0.0901 M allocs: 5.03 MB 0.0901 M allocs: 5.03 MB 1
Information/CorrectedMutualInformation/buslje09/msa_wide 0.742 M allocs: 30.3 MB 0.742 M allocs: 30.3 MB 1
Information/MIp/PF09645 20.3 k allocs: 0.819 MB 20.3 k allocs: 0.819 MB 1
Information/frequencies!/1 0 allocs: 0 B 0 allocs: 0 B
Information/frequencies!/2 0 allocs: 0 B 0 allocs: 0 B
Information/highlevel/BLMI 19.9 k allocs: 1.19 MB 19.9 k allocs: 1.19 MB 1
Information/highlevel/buslje09 0.0377 M allocs: 2.3 MB 0.0377 M allocs: 2.3 MB 1
Information/shannon_entropy/PF09645 0.047 k allocs: 12.2 kB 0.047 k allocs: 12.2 kB 1
MSA/Annotations/filtercolumns/boolean mask 18 allocs: 5.22 kB 18 allocs: 5.22 kB 1
MSA/Annotations/filtercolumns/index array 16 allocs: 1.62 kB 16 allocs: 1.62 kB 1
MSA/Base.vcat/annotated 0.143 k allocs: 6.58 kB 0.143 k allocs: 6.58 kB 1
MSA/Base.vcat/unannotated 0.064 k allocs: 2.7 kB 0.064 k allocs: 2.7 kB 1
MSA/Residue conversions/char2res 3 allocs: 4.1 MB 3 allocs: 4.1 MB 1
MSA/Residue conversions/int2res 3 allocs: 4.1 MB 3 allocs: 4.1 MB 1
MSA/Residue conversions/res2char 3 allocs: 2.05 MB 3 allocs: 2.05 MB 1
MSA/Residue conversions/res2int 3 allocs: 4.1 MB 3 allocs: 4.1 MB 1
MSA/hobohmI/pid20 31 allocs: 1.77 kB 31 allocs: 1.77 kB 1
MSA/hobohmI/pid62 31 allocs: 1.77 kB 31 allocs: 1.77 kB 1
MSA/hobohmI/pid80 31 allocs: 1.77 kB 31 allocs: 1.77 kB 1
MSA/hobohmI/pid99 31 allocs: 1.77 kB 31 allocs: 1.77 kB 1
MSA/identity/matrix_Float64 0.249 k allocs: 11.8 kB 0.249 k allocs: 11.8 kB 1
MSA/identity/mean 1.23 k allocs: 0.0517 MB 1.23 k allocs: 0.0517 MB 1
MSA/read/Clustal 0.394 k allocs: 24.3 kB 0.394 k allocs: 24.3 kB 1
MSA/read/Clustal_num 0.394 k allocs: 24.3 kB 0.394 k allocs: 24.3 kB 1
MSA/read/FASTA 0.406 k allocs: 0.044 MB 0.406 k allocs: 0.044 MB 1
MSA/read/FASTA.gz 0.443 k allocs: 0.0752 MB 0.443 k allocs: 0.0752 MB 1
MSA/read/FASTA.gz_annotated 0.533 k allocs: 0.0794 MB 0.533 k allocs: 0.0793 MB 1
MSA/read/FASTA_deletefullgaps 13.6 k allocs: 17.4 MB 13.6 k allocs: 17.4 MB 1
MSA/read/FASTA_deletefullgaps_mapping 1.64 M allocs: 0.0795 GB 1.64 M allocs: 0.0795 GB 1
MSA/read/Stockholm 0.402 k allocs: 0.033 MB 0.402 k allocs: 0.033 MB 1
MSA/read/Stockholm.gz 0.479 k allocs: 0.0754 MB 0.479 k allocs: 0.0754 MB 1
MSA/read/Stockholm_annotated 0.559 k allocs: 0.0413 MB 0.559 k allocs: 0.0413 MB 1
MSA/read/Stockholm_mapping 2.08 k allocs: 0.104 MB 2.08 k allocs: 0.104 MB 1
MSA/read/Stockholm_mapping_coords 1.64 k allocs: 0.0812 MB 1.64 k allocs: 0.0812 MB 1
MSA/write/FASTA 0.303 k allocs: 14.1 kB 0.303 k allocs: 14.1 kB 1
PDB/_generate_interaction_keys/defaults 0.497 k allocs: 0.0581 MB 0.497 k allocs: 0.0581 MB 1
PDB/_get_matched_Cαs/hemoglobin 0.584 k allocs: 0.0438 MB 0.584 k allocs: 0.0438 MB 1
PDB/_pdbresidues_to_mmcifdict/2vqc 8.56 k allocs: 1.12 MB 8.56 k allocs: 1.12 MB 1
PDB/contact/1CBN_20_30_CB 4 allocs: 0.281 kB 4 allocs: 0.281 kB 1
PDB/contact/1CBN_20_30_heavy 4 allocs: 0.281 kB 4 allocs: 0.281 kB 1
PDB/count_alanine/1CBN 0 allocs: 0 B 0 allocs: 0 B
PDB/distance/1CBN_20_30 0 allocs: 0 B 0 allocs: 0 B
PDB/read/MMCIFFile 0.039 M allocs: 2.9 MB 0.039 M allocs: 2.9 MB 1
PDB/squared_distance/1CBN_20_30_CB 4 allocs: 0.281 kB 4 allocs: 0.281 kB 1
PDB/squared_distance/1CBN_20_30_heavy 4 allocs: 0.281 kB 4 allocs: 0.281 kB 1
Pfam/accession mapping/acc2seqnames 4.32 k allocs: 0.319 MB 4.32 k allocs: 0.319 MB 1
SIFTS/ResidueDetails/_get_details 25 allocs: 1.45 kB 25 allocs: 1.45 kB 1
SIFTS/ResidueDetails/_is_missing 25 allocs: 1.45 kB 25 allocs: 1.45 kB 1
SIFTS/SIFTSResidue/18gs 4 allocs: 0.125 kB 4 allocs: 0.125 kB 1
SIFTS/siftsmapping/2vqc 5.94 k allocs: 0.88 MB 5.94 k allocs: 0.88 MB 1
Utils/get_n_words/ascii 5 allocs: 0.203 kB 5 allocs: 0.203 kB 1
Utils/get_n_words/utf8 5 allocs: 0.219 kB 5 allocs: 0.219 kB 1
Utils/hascoordinates/invalid 0 allocs: 0 B 0 allocs: 0 B
Utils/hascoordinates/valid 0 allocs: 0 B 0 allocs: 0 B
Utils/list2matrix/upper 3 allocs: 1.91 MB 3 allocs: 1.91 MB 1
Utils/list2matrix/upper_diagonal 6 allocs: 2.86 MB 6 allocs: 2.86 MB 1
Utils/matrix2list/upper 3 allocs: 0.952 MB 3 allocs: 0.952 MB 1
Utils/matrix2list/upper_diagonal 3 allocs: 0.956 MB 3 allocs: 0.956 MB 1
time_to_load 0.149 k allocs: 11.1 kB 0.149 k allocs: 11.1 kB 1

@coveralls
Copy link

coveralls commented Dec 18, 2025

Coverage Status

coverage: 96.759% (-0.4%) from 97.156%
when pulling 4892fb5 on codex/implement-pssm-function-for-msa
into 2cdce44 on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants