Setting up diagonalization engine

[[TOC]]

ELPA Library

For static calculations, it is recommended to use ELPA Library, which has better performance than ScaLapack. In particular, ELPA allows for the utilization of GPUs which provide a significant boost for calculations. In order to activate ELPA lib in predefines.h set:

// select diagonalization routine
#define DIAGONALIZATION_ROUTINE ELPA

Moreover, you need to inspect carefully part:

// ---------------------- ELPA SETTINGS ---------------------------
// Fill this part only if ELPA library is used for diagonnalization

// uncomment it if you want to activate GPU for diagonalizations 
#define ELPA_USE_GPU

// Select ELPA kernels
#define ELPS_USE_SOLVER ELPA_SOLVER_1STAGE
#define ELPA_USE_COMPLEX_KERNEL ELPA_2STAGE_COMPLEX_DEFAULT
#define ELPA_USE_REAL_KERNEL ELPA_2STAGE_REAL_DEFAULT

// Fraction of eigenvectors to be extracted in each cycle.
// 1.0 corresponds to extraction if all eigenvectors (USE IT IF YOU YOU ARE NOT SURE)
// NOTE: value of this parameter should assure that all eigenstates below requested Ec are extracted.  
// NOTE: For 3D case this value typically can be set to 0.78
#define ELPA_NEV_FRACTION 1.0

Documentation

Eigenvalue SoLvers for Petaflop-Applications (ELPA)
Wiki: Eigenvalue SoLvers for Petaflop-Applications (ELPA)
[ELPA installation guide](ELPA installation guide)

Publications about ELPA performance

GPU-Acceleration of the ELPA2 Distributed Eigensolver for Dense Symmetric and Hermitian Eigenproblems

ScaLapack library

If the target system does not provide ELPA library user can use (standard) diagonalization library: ScaLAPACK. W-SLDA Toolkit can utilize the following ScaLapack diagonalization engines:

#define DIAGONALIZATION_ROUTINE PZHEEVR

or

#define DIAGONALIZATION_ROUTINE PZHEEVD

It is recommended to use PZHEEVR. This engine takes advantage from the fact that typically we extract only a fraction of eigenstates. However, we find that in some rare cases (system dependent) this routine does not work correctly. In such a case, PZHEEVD should be used.

Benchmarks & Scalings

All tests correspond to the extraction of all eigenvectors.

Table

matrix size	p	q	mb	nb	prec.	routine	system	time [sec]	cost
32,768 = 2x128^2	6	8	16	16	real	ELPA (2-GPU)	Cygnus	93	0.052 nh
45,000 = 2x150^2	6	8	16	16	real	ELPA (2-GPU)	Cygnus	217	0.12 nh
45,000 = 2x150^2	6	8	16	16	complex	ELPA (2-GPU)	Cygnus	860	0.478 nh
65,536 = 2x32^3	24	28	32	32	complex	ELPA (2-GPU)	Summit	115	0.52 nh
65,536 = 2x32^3	24	28	8	8	complex	ELPA (1-GPU)	Summit	118	0.52 nh
128,000 = 2x40^3	24	28	8	8	complex	ELPA (1-GPU)	Summit	435	1.93 nh
128,000	24	28	32	32	complex	ELPA (2-GPU)	Summit	511	2.27 nh
128,000 = 2x40^3	20	20	32	32	complex	ELPA (1-GPU)	Daint	220	24.4 nh
128,000	54	64	32	32	complex	ELPA (2-CPU)	Daint	677	54.1 nh
128,000	54	64	32	32	complex	PZHEEVR	Daint	945	75.6 nh
147,456 = 4x64x24^2	24	25	32	32	complex	ELPA (1-GPU)	Daint	375	62.5 nh
147,456 = 2x768x96	18	18	16	16	double	ELPA (1-GPU)	Daint	395	35.6 nh
221,184 = 2x48^3	46	84	32	32	complex	ELPA (2-GPU)	Summit	603	15.4 nh
221,184	46	84	16	16	complex	ELPA (1-GPU)	Summit	736	18.8 nh
221,184	46	84	16	16	complex	ELPA (2-GPU)	Summit	3098	79.2 nh
221,184	46	84	16	16	complex	PZHEEVD	Summit	5995	153.2 nh
500,000 = 2x50^2x100	96	112	16	16	complex	ELPA (1-GPU)	Summit	2,109	150.0 nh
524,288 = 2x64^3	96	112	16	16	complex	ELPA (1-GPU)	Summit	2,217	157.7 nh
746,496 = 2x72^3	112	192	16	16	complex	ELPA (1-GPU)	Summit	3,436	488.7 nh
746,496	112	192	64	64	complex	ELPA (2-GPU)	Summit	3,628	516.0 nh
1,769,472 = 2x96^3	300	560	32	32	complex	ELPA (1-GPU)	Summit	52,024	57,804 nh

(1-GPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(1-CPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT
(2-GPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(2-CPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT

Plots

These scalings are derived empirically: points correspond to real measurement on target system, while line shows a fit of ideal scaling for level-3 rutines ($\sim N^3$)

Summit

The scaling was derived within ALCC grant Quantum Turbulence in Fermi Superfluids.

Raw data: summit-scaling.txt
Gnuplot script: summit-scaling.gp

Content of Documentation
Official webpage
W-BSK Toolkit

Setting up diagonalization engine

ELPA Library

Documentation

Publications about ELPA performance

ScaLapack library

Benchmarks & Scalings

Table

Plots

Summit

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!