Expose mmseqs --db-load-mode in precompute_alignments_mmseqs.py#579
Open
XunOuyang wants to merge 1 commit into
Open
Expose mmseqs --db-load-mode in precompute_alignments_mmseqs.py#579XunOuyang wants to merge 1 commit into
XunOuyang wants to merge 1 commit into
Conversation
…qs.py The db-load-mode forwarded to colabfold_search.sh (and onward to every mmseqs call) was hardcoded to "0", so users had no way to change how databases are loaded from the precompute_alignments_mmseqs.py entry point. Expose a --db_load_mode argument (0: auto, 1: fread, 2: mmap, 3: mmap+touch) and forward it to the search wrapper. The default remains 0 to preserve the current behavior. When a large input FASTA is processed in chunks against the same precomputed index, keeping the databases resident in memory (mode 2 or 3) avoids re-reading them from disk on every chunk and can substantially speed up the search. Addresses aqlaboratory#190 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses #190.
scripts/precompute_alignments_mmseqs.pyforwards a fixed set of positionalarguments to
scripts/colabfold_search.sh, the last of which is thedb-load-modevalue used by every underlyingmmseqscall(
search,expandaln,align,filterresult,result2msa, ...). That valuewas hardcoded to
"0", so there was no way to change it from the Pythonentry point — exactly the problem reported in #190.
This PR exposes it as a CLI argument and forwards it through:
and replaces the hardcoded literal with
str(args.db_load_mode).Why it helps
--db-load-modecontrols howmmseqsloads databases (0: auto,1: fread,2: mmap,3: mmap+touch). When a large multi-sequence FASTA is processed inchunks (
--fasta_chunk_size) against the same precomputed index, the wrapperis invoked once per chunk. With mode
0, the databases are re-read from diskevery time; with mode
2/3they stay resident in memory, which cansubstantially cut wall-clock time at the cost of higher memory use. This is the
same knob ColabFold exposes.
Behavior / compatibility
0, so existing invocations are unchanged.choices=[0, 1, 2, 3]rejects invalid values early with a clear argparse error.Note on the CPU question in the issue
The issue also asks how to maximize CPU usage.
mmseqsalready uses allavailable cores by default (it has its own
--threadsdefault), so no change isneeded for that; users who want to limit threads can set the
MMSEQS_NUM_THREADSenvironment variable. I kept this PR focused on the
db-load-modeask in the title.Test plan
python -m py_compile scripts/precompute_alignments_mmseqs.pypasses.0,--db_load_mode 2parses to2,and out-of-range values (e.g.
5) are rejected.colabfold_search.shis unchangedexcept for the now-configurable final value.