Expose mmseqs --db-load-mode in precompute_alignments_mmseqs.py by XunOuyang · Pull Request #579 · aqlaboratory/openfold

XunOuyang · 2026-06-07T22:43:49Z

Summary

Addresses #190.

scripts/precompute_alignments_mmseqs.py forwards a fixed set of positional
arguments to scripts/colabfold_search.sh, the last of which is the
db-load-mode value used by every underlying mmseqs call
(search, expandaln, align, filterresult, result2msa, ...). That value
was hardcoded to "0", so there was no way to change it from the Python
entry point — exactly the problem reported in #190.

This PR exposes it as a CLI argument and forwards it through:

parser.add_argument(
    "--db_load_mode", type=int, default=0, choices=[0, 1, 2, 3],
    help="mmseqs database preload mode, forwarded as --db-load-mode ...",
)

and replaces the hardcoded literal with str(args.db_load_mode).

Why it helps

--db-load-mode controls how mmseqs loads databases (0: auto, 1: fread,
2: mmap, 3: mmap+touch). When a large multi-sequence FASTA is processed in
chunks (--fasta_chunk_size) against the same precomputed index, the wrapper
is invoked once per chunk. With mode 0, the databases are re-read from disk
every time; with mode 2/3 they stay resident in memory, which can
substantially cut wall-clock time at the cost of higher memory use. This is the
same knob ColabFold exposes.

Behavior / compatibility

Default is 0, so existing invocations are unchanged.
choices=[0, 1, 2, 3] rejects invalid values early with a clear argparse error.

Note on the CPU question in the issue

The issue also asks how to maximize CPU usage. mmseqs already uses all
available cores by default (it has its own --threads default), so no change is
needed for that; users who want to limit threads can set the MMSEQS_NUM_THREADS
environment variable. I kept this PR focused on the db-load-mode ask in the title.

Test plan

python -m py_compile scripts/precompute_alignments_mmseqs.py passes.
Argparse verified: default resolves to 0, --db_load_mode 2 parses to 2,
and out-of-range values (e.g. 5) are rejected.
The forwarded positional ordering into colabfold_search.sh is unchanged
except for the now-configurable final value.

…qs.py The db-load-mode forwarded to colabfold_search.sh (and onward to every mmseqs call) was hardcoded to "0", so users had no way to change how databases are loaded from the precompute_alignments_mmseqs.py entry point. Expose a --db_load_mode argument (0: auto, 1: fread, 2: mmap, 3: mmap+touch) and forward it to the search wrapper. The default remains 0 to preserve the current behavior. When a large input FASTA is processed in chunks against the same precomputed index, keeping the databases resident in memory (mode 2 or 3) avoids re-reading them from disk on every chunk and can substantially speed up the search. Addresses aqlaboratory#190 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose mmseqs --db-load-mode in precompute_alignments_mmseqs.py#579

Expose mmseqs --db-load-mode in precompute_alignments_mmseqs.py#579
XunOuyang wants to merge 1 commit into
aqlaboratory:mainfrom
XunOuyang:feat/190-mmseqs-db-load-mode

XunOuyang commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

XunOuyang commented Jun 7, 2026

Summary

Why it helps

Behavior / compatibility

Note on the CPU question in the issue

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant