This mini-project implements a simple motif search tool in Python, as part of my bioinformatics learning roadmap.
The goal is to practice working with DNA strings, file input/output, and clean, readable code that could fit into a basic bioinformatics pipeline.
Duration : 1 full day 11/16/2025
MotifFinding/
data/
sequence.txt # Input DNA sequence
motif.txt # Input motif/pattern to search for
outputs/
motif_positions.txt # Output: 1-based positions of the motif in the sequence
src_perso/
motif_finding.py # Main script with functions and examples
README.md
Given:
- a DNA sequence (for example:
GATATATGCATATACTT) - a motif/pattern (for example:
ATAT)
the script finds all 1-based starting positions where the motif appears in the sequence and writes them into a text file.
Example output:
2 4 10
This corresponds to the classical “Finding a Motif in DNA” exercise, widely used in introductory bioinformatics training.
All the core logic is in src_perso/motif_finding.py and is organized in three main functions:
-
find_motif(sequence: str, pattern: str) -> list[int]- Cleans the inputs (removes spaces/newlines, converts to uppercase)
- Slides a window across the sequence
- Returns all 1-based positions where the motif matches
-
load_from_files(seq_path: str, motif_path: str) -> tuple[str, str]- Reads a DNA sequence and a motif from two text files
- Returns both as raw strings
-
save_positions(positions: list[int], out_path: str)- Saves the list of positions into a text file
- If there are no matches, writes:
No occurrences found.
The if __name__ == "__main__": block contains:
- Example 1: simple in-memory example with hard-coded strings
- Example 2: realistic use case using
data/→outputs/
From the src_perso directory:
cd src_perso
python motif_finding.pyYou should see in the terminal:
- Example 1: direct test with sequence and motif
- Example 2: results loaded from the files in
data/
The positions will be saved automatically in:
../outputs/motif_positions.txt
data/sequence.txt
GATATATGCATATACTT
data/motif.txt
ATAT
This produces the following output:
2 4 10
- Python 3.8+
- No external libraries required
This mini-project is part of a bioinformatics learning track where I practice:
- DNA string manipulation in Python
- Reading and writing text files
- Building clean and well-documented functions
- Organizing mini-projects for GitHub (folders, scripts, README)
Possible extensions include:
- FASTA file handling
- Using Biopython
- Searching motifs in larger genomes
- Scanning for multiple motifs