An end-to-end pKa prediction pipeline using state-of-the-art ESM embeddings and ensemble MLP inference. Supports single‑sequence, PDB, UniProt and multi‑FASTA inputs, with configurable channels.
KaML‑ESM runs:
- Embedding extraction (ESM2 or ESMC)
- Ensemble inference (AcidicMLP & BasicMLP)
- Structure folding (ESM3-medium)
- Confomer-based inference (KaML-CBTree)
- Results export (CSV, beta-factor-labeled PDB, logs)
-
Inputs:
•--seq(raw sequence)
•--pdb(local PDB file)
•--pdbid(fetch PDB by ID)
•--uniprot(fetch UniProt sequence)
•--fasta(multi‑FASTA batch) -
Channels:
• Acidic:--acidic(esm2oresmC)
• Basic:--basic(esm2oresmC) -
Structure:
• Default: Forge folding* viaESM_FORGE_TOKEN
•--nofold(do not fold a structure, will disable CBTree unless pdbid/pdb provided)
* in our tests folding takes approximately 20s on average -
Safety:
•--skip_safetyto bypass ESM safety filter (permission required) -
CBTREE:
• Disabled with--nocbtreeand--nofoldunless a pdb or pdbID is input
- Python 3.10+
- Internet (for Forge API, UniProt, PDB fetch)
- Python packages described in: env/KaML-ESM_env.txt
Use the script in env/ for venv:
cd env
./setup_envs.sh venv
This creates a Python 3.10.12 environment, installs dependencies, and
adds bin/ to your PATH.
KaML‑ESM requires ESM_FORGE_TOKEN for Forge and ESMC:
export ESM_FORGE_TOKEN=$(cat path/to/forge_token.txt)
Keep this token private.
Single sequence:
kaml-esm --seq "MEEPQSDPSV..." --outdir results/seq1
Fetch by UniProt:
kaml-esm --uniprot P04637 --outdir results/p53
Fetch PDB:
kaml-esm --pdbid 1CRN --outdir results/1crn
Multi‑FASTA:
kaml-esm --fasta proteins.fasta --nproc 4 --outdir results/all
Skip safety filter (requires permission):
kaml-esm --seq "MEEPQSDPSV..." --skip_safety
Skip structure folding:
kaml-esm --seq "MEEPQSDPSV..." --nofold
Disable CBTREE:
kaml-esm --seq "MEEPQSDPSV..." --nocbtree
Default --outdir is output/, containing:
predictions.csv— per‑residue pKa, shift, error, optional CBTREEpredicted_structure.pdb— updated B‑factors in PDBpipeline.log— debug/info log- subfolders for multi‑FASTA runs
bin/kaml-esm # main CLI script
bin/kaml-cbtree # KaML-CBtree helper script
bin/rida # rida binary (req. by CBtree)
bin/mkdssp # dssp binary (req. by CBtree)
env/setup_envs.sh # env setup for python virtual enviroment (venv)
src/plmpg/esm2 # vendored ESM2 code
wts/ # pretrained weights
README.md # this file
If you use KaML‑ESM, please cite:
Protein Electrostatic Properties are Fine‑Tuned Through Evolution
Shen M., Dayhoff II G.W., Shen J. 2025 (In‑Review)
MIT License