Biosynth Agent: Reverse Engineering Plant Secondary Metabolites

Biosynth Agent is a computational workbench designed for biologists studying plant specialized metabolism. It automates the cognitive process of retrosynthesis—working backward from a complex plant molecule to identify the biosynthetic gene clusters (BGCs) likely responsible for its creation.

🌿 Why This Matters for Plant Biology

Plants are the world's greatest chemists, producing hundreds of thousands of specialized metabolites (terpenes, alkaloids, phenolics). However, the "logic" of how these molecules are assembled is often buried in complex genomes.

This tool helps you learn and predict that logic by treating biosynthesis as a structured language:

The Words: Chemical motifs (amides, double bonds, cyclizations).
The Grammar: Enzymatic transformations (acyl-adenylation, desaturation, decarboxylation).
The Speaker: The specific plant or host proteome executing these steps.

🧠 Scientific Principles: Logic in Biosynthesis

This pipeline is built on the core logical frameworks of scientific inquiry:

1. Deduction (Chemical Deconstruction)

Premise: A molecule exists (e.g., an Echinacea alkamide).
Rule: If an amide bond is present, it must have been formed by condensing an amine and an acid.
Execution: The agent uses Graph Theory (via RDKit) to computationally "cut" the molecule at these logical seams, identifying the precursor modules.

2. Induction (Enzyme Hypothesis)

Observation: The precursor modules (e.g., a branched-chain amine) mimic known biological substrates.
Inference: Therefore, a specific family of enzymes (e.g., PLP-dependent decarboxylases) is likely required.
Execution: The agent maps these chemical modules to probabilistic enzyme families stored in a Knowledge Base.

3. Verification (Genomic Realization)

Hypothesis: The host organism possesses genes encoding these functions.
Test: The agent mines the proteome using Hidden Markov Models (HMMs) to find and rank physical protein candidates.

📂 Repository Architecture

The codebase is structured to mirror the flow of biological information:

Plaintext

src/biosynth_agent/├── chemistry/ # The "Chemist" (RDKit)│ ├── fragmentation.py # Deductive logic: Splitting molecules into precursors│ └── rdkit_motifs.py # Feature extraction: Identifying functional groups├── planning/ # The "Architect" (Search Algorithms)│ ├── beam.py # Beam Search: Ranking the most likely pathways│ ├── enzyme_mapping.py # Knowledge Base: Linking chemistry to enzymes│ └── search.py # Scoring: Evaluating host feasibility (µ - ασ)├── genomics/ # The "Geneticist" (HMMER)│ └── gene_candidates.py # Mining proteomes for specific gene sequences└── cli.py # Orchestrator for the design phase

🛠️ Installation

Prerequisites

Python 3.10+
HMMER: Required for the genomic mining step.
- Linux: sudo apt-get install hmmer
- macOS: brew install hmmer

Setup

Bash

# Clone the repositorygit clone https://github.com/YOUR_USERNAME/biosynth-agent.gitcd biosynth-agent # Set up environmentpython -m venv venvsource venv/bin/activate # Windows: venv\Scripts\activate # Install dependenciespip install -e .

⚡ Usage: A Plant Study Example

Let's analyze a bioactive alkamide (often found in plants like Spilanthes or Echinacea). We want to understand how a host organism could synthesize it.

Step 1: Deconstruct the Molecule (Logic Layer)

We start with the SMILES string. The agent will fragment it and propose a pathway.

Bash

python -m biosynth_agent.cli \ --smiles "CC#CC#CCC/C=C/C=C\C(=O)NCC(C)C" \ --prefix echinacea_target \ --hosts "cyanobacteria,yeast" \ --beam 5

What happens: The agent applies deductive rules to split the amide bond, identifying an acyl chain and an amine. It then uses Beam Search to rank the best enzyme families to perform this coupling.
Output: results/echinacea_target_designpack.json

Step 2: Find the Genes (Physical Layer)

Now, we ask: "Does my specific host (e.g., Cyanobacteria as a model chloroplast) have the genes to do this?"

Bash

python -m biosynth_agent.gene_cli \ --designpack results/echinacea_target_designpack.json \ --host cyanobacteria \ --proteome data/proteomes/syn6803.faa \ --hmm_dir data/hmms \ --out results/echinacea_genes.json

What happens: The agent takes the "Enzyme Hypotheses" from Step 1 and uses HMM profiles to scan the proteome. It calculates bitscores to separate high-probability candidates from noise.
Output: results/echinacea_genes.json

🧪 Critical Thinking & Customization

As a computational biologist, you are encouraged to modify the "Brain" of the agent:

Refine the Logic: Edit src/biosynth_agent/chemistry/fragmentation.py to add new rules for plant-specific bonds (e.g., ester linkages in rosmarinic acid).
Tune the Scoring: Adjust the HostProfile in src/biosynth_agent/planning/search.py to model plant-specific constraints (e.g., cytosolic vs. plastidial localization penalties).
Expand the Knowledge: Add new enzyme families to data/enzyme_kb.json to cover more diverse specialized metabolites.

📄 License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
src/biosynth_agent		src/biosynth_agent
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biosynth Agent: Reverse Engineering Plant Secondary Metabolites

🌿 Why This Matters for Plant Biology

🧠 Scientific Principles: Logic in Biosynthesis

1. Deduction (Chemical Deconstruction)

2. Induction (Enzyme Hypothesis)

3. Verification (Genomic Realization)

📂 Repository Architecture

🛠️ Installation

Prerequisites

Setup

⚡ Usage: A Plant Study Example

Step 1: Deconstruct the Molecule (Logic Layer)

Step 2: Find the Genes (Physical Layer)

🧪 Critical Thinking & Customization

📄 License

About

Uh oh!

Releases

Packages

Languages

License

dillnelson2o/biosynth-jacks-agent

Folders and files

Latest commit

History

Repository files navigation

Biosynth Agent: Reverse Engineering Plant Secondary Metabolites

🌿 Why This Matters for Plant Biology

🧠 Scientific Principles: Logic in Biosynthesis

1. Deduction (Chemical Deconstruction)

2. Induction (Enzyme Hypothesis)

3. Verification (Genomic Realization)

📂 Repository Architecture

🛠️ Installation

Prerequisites

Setup

⚡ Usage: A Plant Study Example

Step 1: Deconstruct the Molecule (Logic Layer)

Step 2: Find the Genes (Physical Layer)

🧪 Critical Thinking & Customization

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages