GetElementsOfReaction
Parses reaction SMILES to extract specific reactants, amino acid sequences, and products. This tool is essential for deconstructing complex biochemical reactions, allowing for detailed analysis of individual components.
- Reaction SMILES must follow this structure:
substrate SMILES | amino acid sequence >> product SMILES - Example:
CC(=O)Cc1ccccc1|MTENALVR>>CC(O)Cc1ccccc1
- Substrate SMILES: Chemical structure of starting material(s) in SMILES notation.
- Multiple substrates should be separated by
.
- Multiple substrates should be separated by
- Amino Acid Sequence: Protein sequence in single-letter code.
- Must be separated from substrates by
| - Valid characters are standard amino acid letters (A-Z)
- Must be separated from substrates by
- Product SMILES: Chemical structure of product(s) in SMILES notation.
- Separated from previous components by
>> - Multiple products should be separated by
.
- Separated from previous components by
Returns a formatted string with three components:
Reactants: [substrate SMILES], AA Sequence: [protein sequence], Products: [product SMILES]
- All SMILES must be valid chemical structures.
- Spaces around separators (
|and>>) are optional.
ExtractBindingSites
Utilizes RXNAAMapper to extract binding sites from reaction SMILES strings. This tool is crucial for understanding enzyme functionality, as it identifies key sites that can be targeted for mutations to enhance catalytic activity or optimize user-specified fitness functions.
- Reaction SMILES must follow this structure:
substrate SMILES | amino acid sequence >> product SMILES - Example:
CC(=O)Cc1ccccc1|MTENALVR>>CC(O)Cc1ccccc1
Returns a string containing the extracted binding sites in the format:
The binding sites are: [start-end, start-end, ...]
Example: The binding sites are: 0-1, 20-24, 34-36
- The tool checks for the existence of required files and directories before execution.
- If the input is invalid or extraction fails, an error message is returned.
OptimizeEnzymeSequences
Optimizes enzyme sequences for biocatalytic reactions using Enzeptional. This powerful tool supports multiple optimization iterations based on substrate and product SMILES, featuring customizable scoring models and interval-specific mutations. It employs Genetic Algorithms to explore the vast sequence space and identify promising enzyme variants with improved catalytic properties. The tool outputs a ranked list of optimized sequences for experimental validation, significantly accelerating the enzyme engineering process.
substrate_smiles(str): SMILES representation of the reactant molecule.product_smiles(str): SMILES representation of the desired product molecule.protein_sequence(str): Amino acid sequence of the enzyme to optimize.scorer_type(str): The scoring model type, either'feasibility'(default) or'kcat'.intervals(List[List[int]], optional): List of regions (start, end) to focus mutations on (e.g., important sites). Example:[[1, 4], [20, 21], [50, 56]].number_of_results(int, optional): Number of optimized sequences to return. Default: 10.
Returns a tabulated string of optimized enzyme sequences ranked by predicted performance for the given reaction.
Example output:
+--------+-----------------------------+--------+
| Index | Sequence | Score |
+--------+-----------------------------+--------+
| 1 | MVLSPADKTNVKAA... | 0.9200 |
| 2 | MVLAPADKTNVKAA... | 0.8900 |
| 3 | MVLSPADRTNVKAA... | 0.8750 |
+--------+-----------------------------+--------+
Blastp
Performs BLASTP (Basic Local Alignment Search Tool for Proteins) searches to identify protein sequences similar to a given query using NCBI. This tool allows customization of key parameters and generates comprehensive output including aligned sequences, descriptions, and statistical data, facilitating detailed protein homology and function analysis. By leveraging the vast NCBI database, it enables researchers to discover evolutionarily related proteins, predict functional similarities, and identify conserved domains. The results can guide further experimental investigations and provide insights into protein structure-function relationships.
query(str): The protein sequence as a string.experiment_id(str, optional): A unique identifier for this run. If not provided, a default ID is generated.database_name(str, optional): The name of the BLAST database to use (default:"swissprot").evalue(float, optional): The E-value threshold for reporting matches (default:1e-5).outfmt(str, optional): The output format (default:"6 sseqid pident evalue bitscore stitle sseq").max_target_seqs(int, optional): The maximum number of aligned sequences to keep (default:10).
Returns a formatted message with result details, including:
- The database used.
- The experiment ID (if provided).
- A tabulated list of top matches with columns such as Accession, Identity (%), E-Value, Bit Score, and Description.
- The total number of matches found.
- The location of saved results (query FASTA file and BLAST output file).
Example output:
BLASTP Search Completed Successfully!
Results saved in: `.cache_dir/blast/blast_logs/experiment_id/experiment_folder`
- Query FASTA File: `.cache_dir/blast/blast_logs/experiment_id/query.fasta`
- BLAST Output File: `.cache_dir/blast/blast_logs/experiment_id/blast_output.txt`
Top Matches:
+------------+------------+---------+-----------+-----------------------------------+
| Accession | Identity | E-Value | Bit Score | Description |
+------------+------------+---------+-----------+-----------------------------------+
| XXX | XX.X | X | X | X |
| XXX | XX.X | X | X | X |
| XXX | XX.X | X | X | X |
+------------+------------+---------+-----------+-----------------------------------+
FindPDBStructure
Finds and retrieves PDB structures based on a query using the RCSB python package. This tool identifies protein structures (PDB structures or 3D structures) related to a given protein sequence by querying the RCSB database.
protein_sequence(str): The protein sequence as a string.
Returns a string containing the PDB code and entity ID of the matching structure (if found).
Example output: "pdb code 1abc with entity id 1"
If no perfect match is found, returns: "Couldn't find a perfect match"
DownloadPDBStructure
Downloads specific PDB structures based on a PDB code using the RCSB Search API. This tool complements the FindPDBStructure functionality by allowing direct retrieval of identified structures. It downloads the corresponding PDB structure file from the RCSB PDB database and saves it in the configured output directory.
pdb_code(str): The PDB code of the structure to download (e.g.,"1abc").
Returns a string message indicating the success or failure of the download.
Example output: "Successfully downloaded PDB file: .cache_dir/pdb/1abc.pdb"
If the download fails, returns an error message: "Error: [status_code], Failed to download PDB file for [pdb_code]"
Mutagenesis
Employs PyMOL to perform targeted mutations on protein structures, enabling the transformation of a protein structure to match a specified target sequence. It can optionally perform additional analyses like RMSD (Root Mean Square Deviation) calculations to assess structural changes. This tool can be used for predicting the structural consequences of amino acid substitutions, allowing researchers to visualize potential changes in protein conformation and stability. By integrating with PyMOL's powerful visualization capabilities, it provides both quantitative and qualitative insights into the effects of mutations on protein structure and function.
pdb_code(str): The 4-character PDB code of the protein structure to mutate (e.g.,"1abc").target_sequence(str): The target protein sequence to mutate towards.perform_rmsd(bool, optional): Whether to perform RMSD calculation (default:False).
Returns a string containing:
- The path to the mutated PDB file.
- If
perform_rmsd=True, the RMSD value between the original and mutated structures.
Example output:
Mutations performed: A1G, L2V. Mutated structure saved to: .cache_dir/mutagenesis/1abc_mutated.pdb. RMSD between original and mutated structure: 0.1234 Å.
MDSimulation
Facilitates Molecular Dynamics simulations using GROMACS. This tool automates the setup and execution of standard MD simulation stages, including Minimization, NVT (constant Number, Volume, Temperature) equilibration, and NPT (constant Number, Pressure, Temperature) equilibration.
pdb_file(Path): Path to the input PDB file containing the protein structure.stages(List[str], optional): List of simulation stages to run. Default is["minimization", "nvt", "npt"].experiment_id(str, optional): A unique identifier for this run. If not provided, a default ID is generated.
Returns a string describing the simulation outcome, including the path to the final output file.
Example output: "MD simulation completed successfully. Final output: .cache_dir/molecular_dynamics/output_file.gro"
- The tool runs the stages in the order specified in the
stagesargument. - Each stage depends on the completion of the previous one (e.g., NVT requires Minimization to complete first).
- Default parameters are provided for each stage, but users can override them using keyword arguments.
- The tool preprocesses the input PDB file to extract only the protein structure before running simulations.