Skip to content

Commit 15ec6e3

Browse files
[ARTIFACTS TPDS] initial prepare
1 parent 0914d9e commit 15ec6e3

File tree

8 files changed

+249
-0
lines changed

8 files changed

+249
-0
lines changed

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,19 @@ Contributors: D'Arnese Eleonora, Conficconi Davide, Del Sozzo Emanuele, Fusco Lu
200200

201201
If you find this repository useful, please use the following citation(s):
202202

203+
```
204+
@article{faber2022,
205+
title={Faber: a Hardware/Soft-ware Toolchain for Image Registration},
206+
author={D'Arnese, Eleonora and Conficconi, Davide and Del Sozzo, Emanuele and Fusco, Luigi and Sciuto, Donatella and Santambrogio, Marco D},
207+
journal={IEEE Transactions on Parallel and Distributed Systems},
208+
year="2022",
209+
publisher = "IEEE Computer Society",
210+
address = "Los Alamitos, CA, USA",
211+
pages = "To Appear"
212+
}
213+
214+
```
215+
203216
```
204217
@inproceedings{iron2021,
205218
author = {Conficconi, Davide and D'Arnese, Eleonora and Del Sozzo, Emanuele and Sciuto, Donatella and Santambrogio, Marco D},

artifacts_tpds22_scripts/README.md

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Artifacts for IEEE Transaction on Parallel and Distributed Systems (TPDS) Open Initiative
2+
3+
Paper Title: Faber a Hardware/Software Toolchain for Image Registration
4+
Authors: Eleonora D'Arnese, Davide Conficconi, Emanuele Del Sozzo, Luigi Fusco, Donatella Sciuto, Marco D. Santambrogio.
5+
Affiliation: Politecnico di Milano
6+
7+
Paper Main Contributions:
8+
* The first open-source HW/SW toolchain to automatically create custom Image Registration (IRG) pipelines exploiting FPGA-based accelerators.
9+
* Three levels of customization hyperparameters to support users in building IRG pipelines
10+
* A design automation methodology for non-FPGA experts to exploit default HW configurations as off-the-shelf SW
11+
* A latency and resource model to guide HW expert users during the customization of the HW accelerators
12+
13+
Faber achieves up to 54xin speedup and 177x in energy efficiency improvements over State of the Art
14+
15+
## Artifacts' Objectives
16+
17+
With this repo all the Faber's manuscript results can be reproduced:
18+
* HW generation
19+
* Single accelerator testing
20+
* Resource prediction and actual usage extraction
21+
* IRG application execution
22+
* Accuracy Extraction
23+
* Latency prediction
24+
* State of the Art comparison
25+
26+
We exclude the biomedical dataset since it is open and available, as well as Matlab and SimpleITK applications, and to respect their intellectual property.
27+
Please Artifact Evaluators contact us for more details or for ready to use setup.
28+
29+
## Testing Environment <a name="testing_env"></a>
30+
1. We tested the hardware code generation on two different machines based on Ubuntu 18.04/20.4 and Centos OS 7.6 respectively.
31+
2. We used Xilinx Vitis Unified Platform and Vivado HLx toolchains 2019.2.
32+
3. We used Python 3 with `argparse` `numpy` `math` packets on the generation machine.
33+
4. a) On the host machines, or hardware design machines, we used Pynq 2.5 on the Zynq based platforms (Pynq-Z2, Ultra96, Zcu104), where we employ `cv2`, `numpy`, `pandas`, `multiprocessing`, `statistics`, `argparse`, `pydicom`, and `scipy` packetes. For pure SW deployment the user will also need `torch` and `kornia` packets.
34+
4. b) We tested the Alveo u200 on a machine with CentOS 7.6, i7-4770 CPU @ 3.40GHz, and 16 GB of RAM, and we installed Pynq 2.5.1 following the [instructions by the Pynq team](https://pynq.readthedocs.io/en/v2.5.1/getting_started/alveo_getting_started.html) with the same packets as point 4a.
35+
5. [Optional] Possible issues with locale: export LANG="en_US.utf8".
36+
37+
## Artifact Installation and Deployment Process
38+
1. Make sure to have installed Vitis and Vivado 2019.2, the Alveo u200 and the U96 devices, as well as Python 3 and its packages.
39+
2. Follow the [PYNQ's team instruction to setup your devices](https://pynq.readthedocs.io/en/v2.5.1/) (Note: the FPGAs can be on a completly different place than the building system)
40+
2. Clone the repo `https://github.com/necst/faber_fpga.git -b artifacts_tpds22`
41+
3. Prepare the environment e.g., `source </my/path/to/xilinx/tools/Vitis/settings64.sh` and `source /opt/xilinx/xrt/setup.sh`
42+
4. Start with one or alle the following reproduction steps
43+
5. All the on-board executions refer to [Testing a HW Design](#testing_designs) or [Deployment Example and example of Dataset Structuring](#deploymentexample)
44+
45+
## Artifacts Reproduction
46+
Follows all the possible manuscript reproduction instructions.
47+
48+
### Table 2: Performance analysis of the MI similarity metric with and without the HW transformation at 200MHz.
49+
Expected time: consider 16 builds (12 Vitis 4-6 hours each; 4 Vivado 2-3 hours each) it is extremely machine dependent, and then the time to collect the results, Worst case 16\*5\*2 minutes..
50+
1. Prepare the environment e.g., `source </my/path/to/xilinx/tools/Vitis/settings64.sh` and `source /opt/xilinx/xrt/setup.sh`
51+
2. build the accelerators (i.e., `bash metric_w-wo_transform.bash`)
52+
3. check in `build/ultra96_v2` and `build/alveo_u200` the presence of a CSV with the resources of such builds
53+
4. Deploy your solution according to [Testing a HW Design](#testing_designs)
54+
5. To compute the performance values extract the run-time values and then apply for Powell's method `=(EXETIME*1000)/((246*512*512)/10^6)/(3*246)` for 1+1 `=(EXETIME*1000)/((246*512*512)/10^6)/(100*246)` since they employ generally different iterations (i.e., 3 for Powell's and 100 for 1+1).
55+
6. For the Energy Efficiency the ZCU104 and the Alveo U200 have an example PYNQ-based code in `faber_fpga/src/sw/example_measurements` to collect the overall power consumption, while for the U96 we instrumented the plug of the device.
56+
57+
### Figure 6: Resource and FPS Scaling for defaults and image size scaling
58+
Expected time: consider six builds (3 Vitis 4-6 hours each; 3 Vivado 2-3 hours each) it is extremely machine dependent, and then the time to collect the results (5/10 minutes).
59+
1. Prepare the environment e.g., `source </my/path/to/xilinx/tools/Vitis/settings64.sh` and `source /opt/xilinx/xrt/setup.sh`
60+
2. build the accelerators (i.e., `bash fps.bash`)
61+
3. check in `build/ultra96_v2` and `build/alveo_u200` the presence of a CSV with the resources of such builds
62+
4. Deploy your solution according to [Testing a HW Design](#testing_designs)
63+
64+
### Figure 7, Figure 9: Execution Time, Model accuracy
65+
Expected time: consider 8 builds (4 Vitis 4-6 hours each; 4 Vivado 2-3 hours each) it is extremely machine dependent, and then the time to collect the results , Worst case 8\*5\*2 minutes..
66+
1. Prepare the environment e.g., `source </my/path/to/xilinx/tools/Vitis/settings64.sh` and `source /opt/xilinx/xrt/setup.sh`
67+
2. build the accelerators (i.e., `bash build_top_bits.bash`)
68+
3. check in `build/ultra96_v2` and `build/alveo_u200` the presence of a CSV with the resources of such builds
69+
4. Deploy your solution according to [Testing a HW Design](#testing_designs)
70+
5. To compute the performance values extract the run-time values and then apply for Powell's method `=(EXETIME*1000)/((246*512*512)/10^6)/(3*246)` for 1+1 `=(EXETIME*1000)/((246*512*512)/10^6)/(100*246)` since they employ generally different iterations (i.e., 3 for Powell's and 100 for 1+1).
71+
6. For the Energy Efficiency the ZCU104 and the Alveo U200 have an example PYNQ-based code in `faber_fpga/src/sw/example_measurements` to collect the overall power consumption, while for the U96 we instrumented the plug of the device.
72+
7. For the model evaluation `bash evaluate_model.bash` and check the results in that folder (i.e., files `*.log`)
73+
74+
75+
### Table 4: State of the Art comparison table
76+
Expected time: consider 9 builds (4 Vitis 4-6 hours each; 5 Vivado 2-3 hours each) it is extremely machine dependent, and then the time to collect the results for each build, Worst case 9\*5\*2 minutes.
77+
1. Prepare the environment e.g., `source </my/path/to/xilinx/tools/Vitis/settings64.sh` and `source /opt/xilinx/xrt/setup.sh`
78+
2. build the accelerators (i.e., `bash build_soa.bash` or if done the previous step `cd ..; make hw_gen PE=1 CORE_NR=3 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=zcu104 METRIC="mse" TRANSFORM="wax"; cd -`)
79+
3. check in `build/ultra96_v2` and `build/alveo_u200` the presence of a CSV with the resources of such builds
80+
4. Deploy your solution according to [Testing a HW Design](#testing_designs)
81+
5. To compute the performance values extract the run-time values and then apply for Powell's method `=(EXETIME*1000)/((246*512*512)/10^6)/(3*246)` for 1+1 `=(EXETIME*1000)/((246*512*512)/10^6)/(100*246)` since they employ generally different iterations (i.e., 3 for Powell's and 100 for 1+1).
82+
6. For the Energy Efficiency the ZCU104 and the Alveo U200 have an example PYNQ-based code in `faber_fpga/src/sw/example_measurements` to collect the overall power consumption, while for the U96 we instrumented the plug of the device.
83+
84+
### Table 3: Accuracy Evaluation
85+
Expected time: consider 4 builds (4 Vitis 4-6 hours each or 4 Vivado 2-3 hours each) it is extremely machine dependent, and then the time to collect the results for each build, Worst case 4\*5\*2 minutes.
86+
87+
1. Prepare the environment e.g., `source </my/path/to/xilinx/tools/Vitis/settings64.sh` and `source /opt/xilinx/xrt/setup.sh`
88+
2. Build and execute on whatever kind of platform the accelerators for every similarity metric without the transform (e.g., for the ALVEO with 1 PE `bash accuracy_eval.bash`, otherwise the one of the previous build)
89+
3. check in `build/ultra96_v2` and `build/alveo_u200` the presence of a CSV with the resources of such builds
90+
4. Deploy your solution according to [Testing a HW Design](#testing_designs)
91+
5. Execute the `src/sw/res_extraction.py` to extract the single image accuracy and then compute the average (e.g., `python3 res_extreaction.py -f 0 -rg <path/to/gold/images/folder> -rt <where/to/find/registered/images> -l <optimizer_label> -rp <where/to/store/results>`).
92+
93+
94+
### Testing a HW Design <a name="testing_designs"></a>
95+
96+
1. Complete at least one design in the previous section, and prepare the HW design for deployment (i.e., `make resyn_extr_zynq_ultra96_v2 ` or `make resyn_extr_vts_alveo_u200`, done by all the bash for the artifacts).
97+
2. `make pysw` creates a deploy folder for the Python code.
98+
3. `make deploybitstr` or `make deployxclbin` `BRD_IP=<target_ip> BRD_USR=<user_name_on_remote_host> BRD_DIR=<path_to_copy>` copy onto the deploy folders the needed files.
99+
4. connect to the remote device, i.e., via ssh `ssh <user_name_on_remote_host>@<target_ip>`.
100+
5. [Optional] install all needed Python packages as above, or the pynq package on the Alveo host machine.
101+
6. Navigate to the `<path/where/deployed>/sw_py`.
102+
7.
103+
* 7a) Launch the script `python_tester_launcher.sh <path/where/deployed>/bitstream_ultra96` (or where you transferred the folder of the .bit) for the Ultra96 testing.
104+
* 7b) Modify the script with PLATFORM=Alveo, and launch the script `python_tester_launcher.sh <path/where/deployed>/xclbn_alveo_u200` (or where you transfered the folder of the .xclbin) for the Alveo testing.
105+
* 7c) the script will automatically detect the accelerator configuraiton (based on the folder name) and setup the testing of both, single accelerator, powell's, and 1+1 registrations, with a dataset structured as [descirbed here](#dataset_description).
106+
8. `python_tester_extractor.sh <path/where/deployed>/bitstream_ultra96 <name for the csv result>` to automatically derive a .csv with most of the useful results of the experimental campaign.
107+
108+
If you wish to have a **single test of the accelerator** for a single bitstream please follow these steps after previous step 6.
109+
7. set `BITSTREAM=<path_to_bits>`, `CLK=200`, `CORE_NR=<target_core_numbers>`, `PLATFORM=Alveo|Zynq`, `RES_PATH=path_results`, and source xrt on the Alveo host machine, e.g., `source /opt/xilinx/xrt/setup.sh`.
110+
8. [Optional] `python3 test-single-mi.py --help` for a complete view of input parameters.
111+
9. Execute the test `python3 test-single-mi.py -ol $BITSTREAM -clk $CLK -t $CORE_NR -p $PLATFORM -im 512 -rp $RES_PATH` (if on Zynq you will need `sudo`).
112+
113+
To execute a **single registration**, follow similar steps such as the previous single accelerator test, or [have a look here](#deploymentexample).
114+
`python3 faber-powell-blocked.py --help` will show a complete view of input parameters for powell HW-based registration procedure.
115+
116+
117+
### <a name="deploymentexample"></a> Deployment Example on Ultra96/Alveo u200/ Pure SW
118+
An example of deployment is `make deploybitstr TRGT_PLATFORM=ultra96_v2 BRD_USR=xilinx BRD_IP=<board ip> BRD_DIR=/home/xilinx/faber` on the Ultra96.
119+
For an Alveo u200 device `make deployxclbin TRGT_PLATFORM=alveo_u200 BRD_USR=<machine user> BRD_IP=<server ip> BRD_DIR=<path of where to deploy>`.
120+
121+
Connect to the host machine and go the target folder.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
cd ..
3+
make hw_gen METRIC='mi' PE=1 CORE_NR=1 HT='float' FREQ_MHZ=200 TRGT_PLATFORM=alveo_u200;
4+
make hw_gen METRIC='cc' PE=1 CORE_NR=1 FREQ_MHZ=200 TRGT_PLATFORM=alveo_u200;
5+
make hw_gen METRIC='mse' PE=1 CORE_NR=1 FREQ_MHZ=200 TRGT_PLATFORM=alveo_u200;
6+
make hw_gen METRIC='prz' PE=1 CORE_NR=1 HT='float' FREQ_MHZ=200 TRGT_PLATFORM=alveo_u200;
7+
make resyn_extr_vts_alveo_u200
8+
9+
cd -
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
bash build_top_bits.bash
3+
cd ..
4+
make hw_gen PE=1 CORE_NR=3 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=zcu104 METRIC="mse" TRANSFORM="wax"
5+
make resyn_extr_zynq_zcu104
6+
7+
cd -
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#!/bin/bash
2+
3+
cd ..
4+
make default_ultra96
5+
make default_alveo_u200
6+
7+
make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="mse" TRANSFORM="wax";
8+
make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="mi" TRANSFORM="wax";
9+
make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="nmi" TRANSFORM="wax";
10+
11+
echo "Hello moto"
12+
13+
make hw_gen METRIC='cc' PE=32 CORE_NR=1 FREQ_MHZ=300 TRGT_PLATFORM=alveo_u200;
14+
make hw_gen METRIC='mse' PE=32 CORE_NR=1 FREQ_MHZ=300 TRGT_PLATFORM=alveo_u200;
15+
make hw_gen METRIC='prz' PE=16 CORE_NR=1 HT='float' FREQ_MHZ=300 TRGT_PLATFORM=alveo_u200;
16+
17+
make resyn_extr_zynq_ultra96_v2
18+
make resyn_extr_vts_alveo_u200
19+
20+
cd -
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/bin/bash
2+
cd ../src/model
3+
4+
python3 model.py -m cc -n 2 -pe 1 -b 8 -d 512 -p -t -i nn ultra96 > default_ultra96.log
5+
python3 model.py -m mi -n 1 -pe 16 -b 8 -d 512 -p default_alveo_u200 > default_alveo_u200.log
6+
7+
python3 model.py -m cc -n 2 -pe 1 -b 8 -d 512 -p -t -i nn ultra96 > waxmse_u96.log
8+
python3 model.py -m cc -n 2 -pe 1 -b 8 -d 512 -p -t -i nn ultra96 > waxmi_u96.log
9+
python3 model.py -m cc -n 2 -pe 1 -b 8 -d 512 -p -t -i nn ultra96 > waxnmi_u96.log
10+
11+
python3 model.py -m cc -n 1 -pe 32 -b 8 -d 512 -p default_alveo_u200 > cc_alveo.log
12+
python3 model.py -m mse -n 1 -pe 32 -b 8 -d 512 -p default_alveo_u200 > mse_alveo.log
13+
python3 model.py -m nmi -n 1 -pe 16 -b 8 -d 512 -p default_alveo_u200 > nmi_alveo.log
14+
15+
cd -

artifacts_tpds22_scripts/fps.bash

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
#!/bin/bash
2+
cd ../; make default_ultra96; make default_ultra96 D=1024; make default_ultra96 D=2048; cd -
3+
cd ../; make default_alveo_u200; make default_alveo_u200 D=1024; make default_alveo_u200 D=2048; cd -
4+
cd ../; make resyn_extr_zynq_ultra96_v2; make resyn_extr_vts_alveo_u200; cd -
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/bin/bash
2+
cd ..
3+
############## Zynq builds
4+
make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="mi" TRANSFORM="wax";
5+
make hw_gen PE=2 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="mi" ;
6+
7+
make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=zcu104 METRIC="mi" TRANSFORM="wax";
8+
make hw_gen PE=2 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=zcu104 METRIC="mi" ;
9+
10+
make resyn_extr_zynq_ultra96_v2
11+
12+
############## Alveo builds
13+
PES=(32 16 8 4 2 1)
14+
CORE_NRS=(1)
15+
HTYPS=('float')
16+
PE_ENTROP=(1)
17+
CACHING=(false)
18+
URAM=(false)
19+
FREQZ=200
20+
INTERPOLATIONS=('nearestn')
21+
22+
for p in ${PES[@]}; do
23+
for cn in ${CORE_NRS[@]}; do
24+
for h in ${HTYPS[@]}; do
25+
for pe in ${PE_ENTROP[@]}; do
26+
for it in ${INTERPOLATIONS[@]}; do
27+
for c in ${CACHING[@]}; do
28+
for u in ${URAM[@]}; do
29+
for wu in ${CACHING[@]}; do
30+
make hw_gen PE=$p CORE_NR=$cn HT=$h TARGET=hw OPT_LVL=3 \
31+
FREQ_MHZ=$FREQZ PE_ENTROP=$pe CACHING=$c URAM=$u TRGT_PLATFORM=alveo_u200 \
32+
WAX_URAM=$wu METRIC="mi" INTERP_TYPE=$it TRANSFORM="wax";
33+
done;
34+
done;
35+
done;
36+
done;
37+
done;
38+
done;
39+
done;
40+
done;
41+
42+
43+
for p in ${PES[@]}; do
44+
for cn in ${CORE_NRS[@]}; do
45+
for h in ${HTYPS[@]}; do
46+
for pe in ${PE_ENTROP[@]}; do
47+
for c in ${CACHING[@]}; do
48+
for u in ${URAM[@]}; do
49+
make hw_gen PE=$p CORE_NR=$cn HT=$h TARGET=hw OPT_LVL=3 \
50+
FREQ_MHZ=$FREQZ PE_ENTROP=$pe CACHING=$c URAM=$u TRGT_PLATFORM=alveo_u200 METRIC="mi";
51+
done;
52+
done;
53+
done;
54+
done;
55+
done;
56+
done;
57+
58+
make resyn_extr_vts_alveo_u200
59+
60+
cd -

0 commit comments

Comments
 (0)