Skip to content

A simple Python tool for finding all 1-based positions of a motif in a DNA sequence. Part of my bioinformatics learning roadmap (DNA strings, file I/O, clean Python functions).

Notifications You must be signed in to change notification settings

yasmina-bioinfo/MotifFinding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

MotifFinding – Mini-Project (Bioinformatics Practice)

This mini-project implements a simple motif search tool in Python, as part of my bioinformatics learning roadmap.
The goal is to practice working with DNA strings, file input/output, and clean, readable code that could fit into a basic bioinformatics pipeline. Duration : 1 full day 11/16/2025

1. Project structure

MotifFinding/
    data/
        sequence.txt        # Input DNA sequence
        motif.txt           # Input motif/pattern to search for
    outputs/
        motif_positions.txt # Output: 1-based positions of the motif in the sequence
    src_perso/
        motif_finding.py    # Main script with functions and examples
    README.md

2. Core idea

Given:

  • a DNA sequence (for example: GATATATGCATATACTT)
  • a motif/pattern (for example: ATAT)

the script finds all 1-based starting positions where the motif appears in the sequence and writes them into a text file.

Example output:

2 4 10

This corresponds to the classical “Finding a Motif in DNA” exercise, widely used in introductory bioinformatics training.


3. Implementation details

All the core logic is in src_perso/motif_finding.py and is organized in three main functions:

  • find_motif(sequence: str, pattern: str) -> list[int]

    • Cleans the inputs (removes spaces/newlines, converts to uppercase)
    • Slides a window across the sequence
    • Returns all 1-based positions where the motif matches
  • load_from_files(seq_path: str, motif_path: str) -> tuple[str, str]

    • Reads a DNA sequence and a motif from two text files
    • Returns both as raw strings
  • save_positions(positions: list[int], out_path: str)

    • Saves the list of positions into a text file
    • If there are no matches, writes: No occurrences found.

The if __name__ == "__main__": block contains:

  • Example 1: simple in-memory example with hard-coded strings
  • Example 2: realistic use case using data/outputs/

4. How to run the project

From the src_perso directory:

cd src_perso
python motif_finding.py

You should see in the terminal:

  • Example 1: direct test with sequence and motif
  • Example 2: results loaded from the files in data/

The positions will be saved automatically in:

../outputs/motif_positions.txt

5. Example input files

data/sequence.txt

GATATATGCATATACTT

data/motif.txt

ATAT

This produces the following output:

2 4 10

6. Requirements

  • Python 3.8+
  • No external libraries required

7. Learning notes

This mini-project is part of a bioinformatics learning track where I practice:

  • DNA string manipulation in Python
  • Reading and writing text files
  • Building clean and well-documented functions
  • Organizing mini-projects for GitHub (folders, scripts, README)

Possible extensions include:

  • FASTA file handling
  • Using Biopython
  • Searching motifs in larger genomes
  • Scanning for multiple motifs

About

A simple Python tool for finding all 1-based positions of a motif in a DNA sequence. Part of my bioinformatics learning roadmap (DNA strings, file I/O, clean Python functions).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages