Knowledge base how to setup the ECM model inputs
- LIKWID
- LaTeX and TikZ for plotting (Step 3)
- Only for streaming kernels
- No hardware counter measurements included for verification
The script ./bench_scan_size.sh has to be used to run the benchmark and collect the performance results.
The script takes in a run configuration file which defines the likwid-bench benchmarks to be runs, the result folders and hardware settings.
Some sample configuration files can be found in the run_config folder.
The benchmark runs streaming kernels with different stream/array size.
For example to run with the settings in run_config/casclakesp2_config.txt file the following can be used:
/bench_scan_size.sh run_config/casclakesp2_config.txt
NB: If some basic performance relevant hardware configuration is not set as described in the config file, the script will pre-exit with a message "Hardware not configured properly".
This step is optional. The step generates the ECM model plots corresponding to the benchmarks run in the previous step. The ECM generation requires two basic inputs: application model and machine model.
The application model defines the properties of the benchmark.
The application model is defined using a file written into the folder ecm_generator/application_model which indicates the number and type of streams seen in the benchmark.
Read-onlyspecifies the number of steams/arrays that have to be just readWrite-onlyspecifies the number of steams/arrays that have to be just writtenRead-Writespecifies the number of steams/arrays that have to be both read and written
Examples of the application model files can be found in the ecm_generator/application_model folder.
The machine model determines the machine capabilities.
It is defined using files written to ecm_generator/machine_model folder.
The files carry information like:
CL_sizespecifies the cacheline size in bytes[cache-name]_read_bwspecifies the read bandwidth between the given [cache-name] cache level and its higher hierarchy. For exampleL1_read_bwindicates read bandwidth between L1 and registers[cache-name]_write_bwspecifies the write bandwidth between the given [cache-name] cache level and its higher hierarchy. For exampleL1_write_bwindicates write bandwidth between L1 and registers[cache-name]_shared_bwspecifies the bandwidth between the given [cache-name] cache level and its higher hierarchy when a different resource becomes a bottleneck. For exampleL1_shared_bwindicates the bandwidth when the common (for both reads and writes) address generation unit (AGU) becomes a bottleneck.[cache-name]_WAindicates whether write-allocate is applicable for the cache[cache-name]_VICTIMindicates whether the cache is a victim cache[cache-name]_SIZEindicates the cache size in kBECM_hypothesisindicates the overlap hypothesis of the given hardware under a given setting.
The script ecm_generator/ecm.sh generates the ECM model prediction data.
It takes the application model and machine model defined above as input.
For example to generate ECM model with application model ecm_generator/application_model/copy.config and machine model ecm_generator/machine_model/casclakesp2_nps1.config following command can be used:
cd ecm_generator
./ecm.sh "application_model/copy.config" "machine_model/casclakesp2_nps1.config"
Note that in general only the application model (2.1) and machine model (2.2) has to be defined. The generation of ECM model (2.3) need not be done manually as indicated here and will be done automatically when calling the plotting script. See next section.
The script ./generate_all_plots.sh runs ecm script ecm_generator/ecm.sh (2.3) and collects the performance measurements collected in Step 1 to generate final plots.
The script requires the location of folder where performance measurements are collected (as specified through configuration file in Step 1) and the machine file corresponding to the machine model.
For example to plot the results collected in results/casclakesp2/nps2/avx512/ with the ECM model corresponding to machine file ecm_generator/machine_model/casclakesp2/nps2/avx512/casclakesp2_nps2_avx512.config, the following command should be used:
./generate_all_plots.sh results/casclakesp2/nps2/avx512/ ecm_generator/machine_model/casclakesp2/nps2/avx512/casclakesp2_nps2_avx512.config
The plots are then generated in a folder called plots located in the same directory given in the input (results/casclakesp2/nps2/avx512/ in above example).
Plots (in pdf format) for different benchmarks as well as an overall compiled plot called ecm.pdf could be found in the plots directory.