Needleman-Wunsch Algorithm

Team Members:

Introduction

The Needleman-Wunsch algorithm is an algorithm that is used in bioinformatics to align DNA or protein sequences. It involves the use of dynamic programming to compare biological sequences. The Needleman-Wunsch algorithm is used to find the optimal global alignment, and essentially divides a larger problem into a series of smaller problems to find this optimal alignment.

Task

We attempt to make improvements upon the baseline performance of the algorithm by parallelizing it and optimizing it to achieve the peak performance possible.

Performance Evaluation Framework

The main performance metrics were Execution Time and Throughput.

How to Run

The desired version can be compiled using the given Makefile as follows.

make <VERSION>

VERSION can be any of the following:

Brute force: brute_nw
Cache locality: optimized_nw_1
Compiler optimizations: optimized_nw_2
Anti-diagonal parallelization: optimized_nw_3
Submatrix Tiling: optimized_nw_4

To then run the generated executable, run

./<VERSION>

This then generates a text file containing the output of the form

A1 B1
A2 B2
...
...
...

where A1, A2, etc. are the execution times for particular lengths of the strings and B1, B2, etc. are the corresponding lengths of the strings.

A plot of execution time vs length of the string can be generated for all the output text files by running

python3 plot.py

Brute Force

The code for the brute force, unoptimized version of the Needleman-Wunsch algorithm is present in brute_nw.cpp.

Optimization Techniques Used

The following are the approaches used to optimize the algorithm. For a detailed report of how the algorithms work, the compute configuration of the system on which the optimizations were done and the results obtained, refer to report.pdf.

1. Exploiting Cache Locality

Accessing data row-wise gave better results than accessing colum-wise. This difference is intuitive since 2D matrices are stored in row-major form. The code is present in optimized_nw_1.cpp.

2. Compiler Optimizations

We tested the code by adding various compiler flags and on various compilers (both ICC and GCC). The code is present in optimized_nw_2.cpp.

3. Parallelization along the Anti-Diagonal

By analyzing the dependency graph in states of the dynamic programming problem, we concluded that traversing along anti-diagonals would reduce the number of dependencies at each step, thus enabling higher parallelization. The code is present in optimized_nw_3.cpp.

4. Tiling using Submatrices

Although the anti-diagonal optimization allows higher parallelism, it doesn't let us exploit cache locality. Thus, we used the Tiling Technique. iling is a technique wherein the original DP matrix is divided into square submatrices. We then fill each submatrix with the corresponding values just like we do for the normal matrix, using row traversal. We proceed in this manner, filling all the submatrices and filling the leftover cells using brute force, iterating through all of them and filling them based on the values of the previous cells. It allows us to make use of both parallelism, and cache locality. The code is present in optimized_nw_4.cpp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Needleman-Wunsch Algorithm

Introduction

Task

Performance Evaluation Framework

How to Run

Brute Force

Optimization Techniques Used

1. Exploiting Cache Locality

2. Compiler Optimizations

3. Parallelization along the Anti-Diagonal

4. Tiling using Submatrices

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Makefile		Makefile
README.md		README.md
brute_nw.cpp		brute_nw.cpp
helper.cpp		helper.cpp
helper.hpp		helper.hpp
optimized_nw_1.cpp		optimized_nw_1.cpp
optimized_nw_2.cpp		optimized_nw_2.cpp
optimized_nw_3.cpp		optimized_nw_3.cpp
optimized_nw_4.cpp		optimized_nw_4.cpp
plot.py		plot.py
report.pdf		report.pdf

coniferousdyer/Needleman-Wunsch-Optimization

Folders and files

Latest commit

History

Repository files navigation

Needleman-Wunsch Algorithm

Introduction

Task

Performance Evaluation Framework

How to Run

Brute Force

Optimization Techniques Used

1. Exploiting Cache Locality

2. Compiler Optimizations

3. Parallelization along the Anti-Diagonal

4. Tiling using Submatrices

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages