Biosynth Agent is a computational workbench designed for biologists studying plant specialized metabolism. It automates the cognitive process of retrosynthesis—working backward from a complex plant molecule to identify the biosynthetic gene clusters (BGCs) likely responsible for its creation.
Plants are the world's greatest chemists, producing hundreds of thousands of specialized metabolites (terpenes, alkaloids, phenolics). However, the "logic" of how these molecules are assembled is often buried in complex genomes.
This tool helps you learn and predict that logic by treating biosynthesis as a structured language:
- The Words: Chemical motifs (amides, double bonds, cyclizations).
- The Grammar: Enzymatic transformations (acyl-adenylation, desaturation, decarboxylation).
- The Speaker: The specific plant or host proteome executing these steps.
This pipeline is built on the core logical frameworks of scientific inquiry:
- Premise: A molecule exists (e.g., an Echinacea alkamide).
- Rule: If an amide bond is present, it must have been formed by condensing an amine and an acid.
- Execution: The agent uses Graph Theory (via RDKit) to computationally "cut" the molecule at these logical seams, identifying the precursor modules.
- Observation: The precursor modules (e.g., a branched-chain amine) mimic known biological substrates.
- Inference: Therefore, a specific family of enzymes (e.g., PLP-dependent decarboxylases) is likely required.
- Execution: The agent maps these chemical modules to probabilistic enzyme families stored in a Knowledge Base.
- Hypothesis: The host organism possesses genes encoding these functions.
- Test: The agent mines the proteome using Hidden Markov Models (HMMs) to find and rank physical protein candidates.
The codebase is structured to mirror the flow of biological information:
src/biosynth_agent/├── chemistry/ # The "Chemist" (RDKit)│ ├── fragmentation.py # Deductive logic: Splitting molecules into precursors│ └── rdkit_motifs.py # Feature extraction: Identifying functional groups├── planning/ # The "Architect" (Search Algorithms)│ ├── beam.py # Beam Search: Ranking the most likely pathways│ ├── enzyme_mapping.py # Knowledge Base: Linking chemistry to enzymes│ └── search.py # Scoring: Evaluating host feasibility (µ - ασ)├── genomics/ # The "Geneticist" (HMMER)│ └── gene_candidates.py # Mining proteomes for specific gene sequences└── cli.py # Orchestrator for the design phase-
Python 3.10+
-
HMMER: Required for the genomic mining step.
- Linux:
sudo apt-get install hmmer - macOS:
brew install hmmer
- Linux:
# Clone the repositorygit clone https://github.com/YOUR_USERNAME/biosynth-agent.gitcd biosynth-agent # Set up environmentpython -m venv venvsource venv/bin/activate # Windows: venv\Scripts\activate # Install dependenciespip install -e .Let's analyze a bioactive alkamide (often found in plants like Spilanthes or Echinacea). We want to understand how a host organism could synthesize it.
We start with the SMILES string. The agent will fragment it and propose a pathway.
python -m biosynth_agent.cli \ --smiles "CC#CC#CCC/C=C/C=C\C(=O)NCC(C)C" \ --prefix echinacea_target \ --hosts "cyanobacteria,yeast" \ --beam 5- What happens: The agent applies deductive rules to split the amide bond, identifying an acyl chain and an amine. It then uses Beam Search to rank the best enzyme families to perform this coupling.
- Output:
results/echinacea_target_designpack.json
Now, we ask: "Does my specific host (e.g., Cyanobacteria as a model chloroplast) have the genes to do this?"
python -m biosynth_agent.gene_cli \ --designpack results/echinacea_target_designpack.json \ --host cyanobacteria \ --proteome data/proteomes/syn6803.faa \ --hmm_dir data/hmms \ --out results/echinacea_genes.json- What happens: The agent takes the "Enzyme Hypotheses" from Step 1 and uses HMM profiles to scan the proteome. It calculates bitscores to separate high-probability candidates from noise.
- Output:
results/echinacea_genes.json
As a computational biologist, you are encouraged to modify the "Brain" of the agent:
- Refine the Logic: Edit
src/biosynth_agent/chemistry/fragmentation.pyto add new rules for plant-specific bonds (e.g., ester linkages in rosmarinic acid). - Tune the Scoring: Adjust the
HostProfileinsrc/biosynth_agent/planning/search.pyto model plant-specific constraints (e.g., cytosolic vs. plastidial localization penalties). - Expand the Knowledge: Add new enzyme families to
data/enzyme_kb.jsonto cover more diverse specialized metabolites.
MIT License.