Skip to content

sequencer

Okerew edited this page Sep 13, 2025 · 1 revision

DNASequencer Class


Overview

The DNASequencer class simulates the process of DNA sequencing, including the introduction of sequencing errors and the assembly of sequenced reads into a complete DNA sequence. It also provides methods for analyzing the quality of the sequencing process.


Class Definition

class DNASequencer:
    def __init__(self, error_rate: float = 0.001, read_length: int = 100):
        """
        Initialize a DNA Sequencer.
        :param error_rate: Probability of a sequencing error (default: 0.001)
        :param read_length: Length of each read (default: 100 nucleotides)
        """
        ...

Attributes

Attribute Type Description
error_rate float Probability of a sequencing error.
read_length int Length of each read in nucleotides.

Methods

Initialization

  • __init__(self, error_rate: float = 0.001, read_length: int = 100) Initializes a new DNASequencer instance with the specified error rate and read length.

DNA Sequencing

  • sequence(self, dna: str) -> List[str] Sequences the given DNA, simulating the physical process with potential errors.

    • Parameters:

      • dna: The DNA sequence to be sequenced.
    • Returns: List of sequenced reads.

    • Details:

      • The DNA is divided into reads of the specified length.
      • Each read may contain sequencing errors.

Error Introduction

  • _introduce_errors(self, read: str) -> str Introduces sequencing errors into a read.

    • Parameters:

      • read: The original read.
    • Returns: The read with potential errors.

    • Details:

      • Errors can be substitutions, insertions, or deletions.
      • The type and position of errors are determined randomly based on the error rate.

Sequence Assembly

  • assemble(self, reads: List[str]) -> str Assembles the sequenced reads into a complete DNA sequence.

    • Parameters:

      • reads: List of sequenced reads.
    • Returns: Assembled DNA sequence.

    • Details:

      • The assembly process starts with the longest read.
      • Subsequent reads are added based on overlaps with the assembled sequence.

Overlap Detection

  • _find_overlap(self, seq1: str, seq2: str) -> int Finds the overlap between two sequences.

    • Parameters:

      • seq1: First sequence.
      • seq2: Second sequence.
    • Returns: Length of the overlap.


Quality Analysis

  • analyze_quality(self, original: str, sequenced: str) -> Tuple[float, float, float] Analyzes the quality of the sequencing by comparing the original and sequenced DNA.

    • Parameters:

      • original: The original DNA sequence.
      • sequenced: The sequenced DNA.
    • Returns: Tuple of (accuracy, coverage, average_read_length).

    • Details:

      • Accuracy: Proportion of matching nucleotides between the original and sequenced DNA.
      • Coverage: Ratio of the length of the sequenced DNA to the original DNA.
      • Average Read Length: Average length of the sequenced reads.

Example Usage

# Initialize a DNA sequencer
sequencer = DNASequencer(error_rate=0.001, read_length=100)

# Define a DNA sequence
dna_sequence = "ATGCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG"

# Sequence the DNA
reads = sequencer.sequence(dna_sequence)
print(f"Sequenced reads: {reads}")

# Assemble the reads
assembled_sequence = sequencer.assemble(reads)
print(f"Assembled sequence: {assembled_sequence}")

# Analyze the quality of the sequencing
accuracy, coverage, avg_read_length = sequencer.analyze_quality(dna_sequence, assembled_sequence)
print(f"Accuracy: {accuracy:.2%}")
print(f"Coverage: {coverage:.2f}")
print(f"Average read length: {avg_read_length:.2f}")

Dependencies

  • random: For simulating random sequencing errors.

Error Handling

  • The class does not explicitly handle errors, but it includes checks to avoid index errors when processing sequences.

Notes

  • The DNASequencer class is designed to simulate the DNA sequencing process.
  • It supports the introduction of sequencing errors, including substitutions, insertions, and deletions.
  • The class provides a simplified assembly process for reconstructing the original sequence from reads.
  • The analyze_quality method allows for evaluating the accuracy, coverage, and read length of the sequencing process.

Clone this wiki locally