This project implements the Longest Common Subsequence (LCS) algorithm, a classic dynamic programming problem used to find the longest subsequence common to two sequences. LCS is widely used in bioinformatics, text comparison, and version control systems.
This repository provides three different implementations of the LCS algorithm:
-
Sequential Solution with Space Optimization
A standard dynamic programming approach executed on a single core, which keeps only two rows of the DP matrix in memory during execution to reduce space complexity. -
Multi-Threading Parallel Solution
Utilizes PThreads and the Worksharing design pattern to accelerate computation by dividing the workload among threads.
This solution adopts a wavefront approach, allowing computation to proceed along antidiagonals of the DP matrix. For each antidiagonal, each thread processes a subset of its elements.
Two variants are provided: one that stores the entire matrix in memory, and a space-optimized version that maintains only three diagonals at any time. -
Distributed Solution
Implements a hybrid approach using MPI, PThreads, and OpenMP to process large sequences efficiently across multiple machines or nodes.
This solution follows a Manager-Worker paradigm (also known as Distributed Bag of Tasks or Processors Farm) and employs a two-level tiling strategy:- The master MPI process divides the DP matrix into tiles and creates three threads using PThreads:
- Producer thread: Generates tasks, each corresponding to a tile.
- Sender thread: Sends tasks to idle workers, receives results, and injects computed values into the dependencies of tasks to be sent.
- Auxiliary sender thread: Handles rare cases where the sender tries to inject dependencies into tasks not yet produced by the producer. The auxiliary sender waits until these tasks are ready to be sent.
- Each worker MPI process works on one tile at a time. Upon receiving a tile, the worker:
- Performs a second level of tiling, dividing each tile into sub-tiles.
- Uses OpenMP and the Worksharing design pattern to compute the elements of the received tile.
- Specifically, the worker applies a wavefront (antidiagonal) approach at the sub-tile level: it processes antidiagonals of sub-tiles, and for each antidiagonal, each thread works on a subset of sub-tiles in that diagonal.
- The master MPI process divides the DP matrix into tiles and creates three threads using PThreads:
For a detailed explanation of the implementation, design choices, execution times, and performance analysis, see the public Canva presentation:
Canva Presentation Link








