This project implements various optimizations for sequential and parallel algorithms using OpenMP and AVX2.
Before starting, ensure you have the following tools and libraries installed in your environment:
- g++ (preferably version 9.3 or later)
- WSL (Windows Subsystem for Linux) if you're working on a Windows system.
- Make (optional, if you want to automate the compilation using a Makefile).
- omp.h (included with a compiler that supports OpenMP)
- immintrin.h (included with the compiler for AVX2 instructions)
To verify the availability of the required libraries, compile and run a simple test program:
#include <omp.h>
#include <immintrin.h>
#include <stdio.h>
int main() {
printf("OpenMP and AVX2 are available!\n");
return 0;
}Compile the test program using:
g++ -fopenmp -mavx2 test.c -o testTo compile the project, use the following command:
g++ -O3 -mavx2 -march=native -fopenmp -ffast-math -funroll-loops introPARCO_2024_H1.c all_optimization.h sequential_implementation.c implicit_parallelization.c explicit_parallelization.c -o introPARCO_2024-
-O3: Enables high-level optimizations. -
-mavx2: Enables AVX2 instructions. -
-march=native: Utilizes features specific to your CPU. -
-fopenmp: Enables OpenMP support. -
-ffast-math: Improves the speed of math operations (may slightly reduce precision). -
-funroll-loops: Expands loops for performance improvement.
This will create an executable named introPARCO_2024.
To run the program, use the following command:
Running the Program To run the program, use the following command:
./introPARCO_2024The project consists of the following files:
- introPARCO_2024_H1.c: The main entry point of the program.
- all_optimization.h: Header file containing common definitions and macros.
- sequential_implementation.c: Implementation of the sequential version of the algorithms.
- implicit_parallelization.c: Parallel implementation using an implicit approach (e.g., OpenMP).
- explicit_parallelization.c: Parallel implementation using explicit AVX2 intrinsics.
- Error:
#include <omp.h>not found
- Ensure you are using a compiler that supports OpenMP (e.g.,
g++). -On Ubuntu, you can install it with:
sudo apt update
sudo apt install g++- Error:
#include <immintrin.h>not found
- Ensure your compiler supports AVX2 instructions. Check your CPU support using:
lscpu | grep avx2If AVX2 is not listed, your hardware does not support these instructions.
3. Slow performance
- Ensure you are using the
-march=nativeflag and that your system supports AVX2.
Here is an example of the output generated by the program:
EVALUATING THE PERFORMANCES AND MEASURING TIME TAKEN FOR MATRIX TRANSPOSITION
Theoretical peak performance: 172.031997681 FLOP/s
Theoretical peak memory bandwidth: 102400 MB/s
B = Bandwidth MB
etc...........
COMPUTING SPEEDUP AND EFFICIENCY GAINS FOR THE OPENMP IMPLEMENTATION
S = Speedup E = Efficiency
etc...........