Parallel Computing Course

Academic Year 2023-24 | NTUA

A short summary

This repository explores parallel computing paradigms through practical implementations completed as part of the Parallel Systems curriculum at the National Technical University of Athens. The work spans fundamental parallel/distributed computing architectures: shared memory systems, GPU accelerators, and distributed computing environments.

Module 1: Introduction to OpenMP

Project: Conway's Cellular Automaton

The first module is the classic Game of Life simulation with OpenMP parallelization. The main aspects of the project include:

Multi-threaded implementation with dynamic thread allocation
Systematic performance benchmarking
Visual representation of the result using charts.
Execution time analysis

Module 2: Shared Memory Architecture Deep Dive

Implementation A: Parallel K-means Classification

Explored two distinct parallelization strategies for the clustering algorithm:

Strategy 1: Synchronized shared cluster approach leveraging OpenMP directives like #pragma omp parallel for for parallel calculation of NewCluster and NewClusterSize
Strategy 2: Replication-based design with reduction and final calculation assigned to thread 0 (main thread).

Also explored thread affinity configurations GOMP_CPU_AFFINITY that binds threads to specific cores, cache coherence challenges, and memory locality optimizations through NUMA-aware strategies.

Implementation B: Synchronization Primitive Comparison

Conducted an empirical study of locking mechanisms:

Typical implementations: pthread mutexes pthread_mutex_lock and spinlocks pthread_spin_lock
Custom solutions: test-and-set (tas_lock), test-and-test-and-set (ttas_lock), array-based locks (array_lock), CLH queue locks (clh_lock)
Comparative analysis with OpenMP's directives #pragma omp critical and #pragma omp atomic

Implementation C: Floyd-Warshall Algorithm task-parallelization

Recursive Implementation Parallelization with OpenMP tasks.
Performance profiling across various matrixdimensions
Comparison of tiled and recursive approaches (additional work)

Implementation D: Thread-Safe Data Structure Design

Investigated concurrent linked list implementations through multiple synchronization paradigms:

Coarse-grain locking
Fine-grain locking
Optimistic synchronization
Lazy synchronization
Non-blocking synchronization

Module 3: GPU Computing with CUDA

Advanced K-means on Graphics Processors

Progressively refined GPU implementations demonstrating optimization techniques:

Naive version: Direct kernel implementation for cluster assignment
Transpose version: Data transposition for coalesced access patterns
Shared-memory version: Shared memory utilization for bandwidth reduction
Full-offload version: The implementation is entirely executed in GPU thus eliminating CPU-GPU transfer overhead

Module 4: Distributed Computing with Message Passing (MPI)

Distributed K-means Implementation

Developed a message-passing variant using MPI infrastructure:

Distributed nodes implementation
Scalability comparison with shared-memory OpenMP implementation

Parallel PDE Solver: Heat Equation

Distributed implementation of iterative solvers for 2D thermal diffusion:

Method 1: Standard Jacobi iteration
Method 2: Successive over-relaxation (Gauss-Seidel variant)
Method 3: Red-Black ordering SOR

Visual representation encompassed both fixed-iteration and convergence-based termination criteria, with scalability analysis across process counts.

This hands-on experience bridged theoretical concepts with real-world parallel computing challenges, establishing a solid foundation in modern high-performance computing techniques. Utilizing all current frameworks and libraries like CUDA, OpenMP, MPI.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
1_gameoflife		1_gameoflife
2_conclist		2_conclist
2_sharedmem		2_sharedmem
3_gpu/kmeans		3_gpu/kmeans
4_distributedcomp		4_distributedcomp
reports		reports
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Computing Course

Academic Year 2023-24 | NTUA

A short summary

Module 1: Introduction to OpenMP

Project: Conway's Cellular Automaton

Module 2: Shared Memory Architecture Deep Dive

Implementation A: Parallel K-means Classification

Implementation B: Synchronization Primitive Comparison

Implementation C: Floyd-Warshall Algorithm task-parallelization

Implementation D: Thread-Safe Data Structure Design

Module 3: GPU Computing with CUDA

Advanced K-means on Graphics Processors

Module 4: Distributed Computing with Message Passing (MPI)

Distributed K-means Implementation

Parallel PDE Solver: Heat Equation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parallel Computing Course

Academic Year 2023-24 | NTUA

A short summary

Module 1: Introduction to OpenMP

Project: Conway's Cellular Automaton

Module 2: Shared Memory Architecture Deep Dive

Implementation A: Parallel K-means Classification

Implementation B: Synchronization Primitive Comparison

Implementation C: Floyd-Warshall Algorithm task-parallelization

Implementation D: Thread-Safe Data Structure Design

Module 3: GPU Computing with CUDA

Advanced K-means on Graphics Processors

Module 4: Distributed Computing with Message Passing (MPI)

Distributed K-means Implementation

Parallel PDE Solver: Heat Equation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages