This skill enables AI agents to help you find patterns, motifs, and subsequences in biological sequences using Biopython. It covers exact matches, regex patterns, IUPAC ambiguity codes, and probabilistic motif searching.
pip install biopythonTell your AI agent what you want to do:
- "Find all occurrences of GAATTC in this sequence"
- "Search for EcoRI and BamHI restriction sites"
- "Find TATA box variants in the promoter region"
- "Search both strands for this motif"
"Find all positions where GAATTC occurs in my sequence"
"What restriction enzymes cut this sequence?"
"Search for GAATTC on both strands"
"Find all matches to the pattern GATNNTC where N is any base"
"Search for TATA box variants in the first 500 bp"
"Find tandem repeats of CAG in this sequence"
"Find all ATG start codons and their positions"
- Import Bio.Seq and regex if needed
- Create appropriate search function for the pattern type
- Search the sequence (and reverse complement if requested)
- Return positions and matched subsequences
- Format results clearly
- Exact: Fixed sequence like
GAATTC - IUPAC: With ambiguity codes like
GATNNTC(N = any base) - Regex: Flexible patterns like
TATA[AT]A[AT] - PWM/PSSM: Probabilistic scoring matrices
- Always search both strands for biological relevance
- Use IUPAC codes for patterns with known ambiguity
- For regulatory elements, use regex for variant matching
- Consider frame position for coding sequence motifs
- Large sequences benefit from compiled regex patterns