Highly optimized versions of Clifford Layers.
pip install -e .[dev]Unvectorized verions are directly tested on gitlab CI, cf .gitlab-ci.yml.
Go inside clib/convolutional, run make base, make conv2d_opt1, make conv_opt2.
Then run pytest on tests/test_clifford_convolution.py and tests/test_clifford_convolution_opt2.py. The C versions will be tested against Microsoft implementation of Clifford convolutional layers.
Go inside clib/multivector_activation, run make all (icx compiler needed).
The run pytest on tests/test_multivector_act.py and tests/test_multivector_act_opt.py. The C versions will be tested against Microsoft implementation of multivector activation layers.
Go inside clib/matrix_multiplication, run make base/opt1/opt2/opt3 for the version you want to test.
Then run pytest on tests/test_clifford_linear.py. The C versions will be tested against Microsoft implementation of Clifford convolutional layers (filter size=1).
Go inside benchmarking/convolutional_layer, run bash benchmark_opt2.sh and bash benchmakr_base_opt1.sh.
Go inside benchmarking/multivector_activation, run make all and ./multivector_bench_x86. For Plotting run benchmarking/performance_plots/plot_act_layer.py.
Go inside benchmarking/linear_layer, run make base, make opt1, make opt2, make opt3, ./bench_base, ./bench_opt1, ./bench_opt2 and ./bench_opt3.
Go inside benchmarking/multivector_activation, run make all. Then go inside clib/convolutional and run make conv_opt2.
Then run python benchmarking/with_pytorch/benchmark.py.
Vtune measurements were performed mostly manually in order to ensure correct measurements.
Example command to run manually:
sudo /opt/intel/oneapi/vtune/latest/bin64/vtune -collect uarch-exploration -result-dir /home/intel/vtune/projects/Clifford/Conv_2D_Opt2 --app-working-dir=/home/intel/vtune/projects/Clifford -- /home/Benchmarking/init_benchmark.sh /home/team67/benchmarking/convolutional_layer/benchmark_opt2.out
Parameters used in VTune measurements:
- Multivector Activation: K=8, B=512, C=512, REPEAT=2000
- Convolutional Layer 2D, fsz=17, ch_step=7, ch_stop=30, d1=d2=60, batch=8
- Linear Layer 2D, Batches=128, 32 input and output channels