Skip to content
This repository was archived by the owner on Oct 21, 2023. It is now read-only.

Commit 9f7a560

Browse files
authored
Merge pull request #9 from FWDNXT/2020.2_p
2020.2 p
2 parents 2daaf37 + 0aa2583 commit 9f7a560

19 files changed

+2342
-75
lines changed

README.md

+79-25
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ docs/C API.md.
2121

2222
[**Examples**](examples/): Example code and Deep Learning tutorial.
2323

24+
[**Pytorch-torchscript**](torch_mdla/README.md): Tutorial on how to add Micro DLA into pytorch using torchscript.
2425

2526
## Table of Contents:
2627

@@ -45,17 +46,18 @@ docs/C API.md.
4546
- [6. Tutorial - PutInput and GetResult](#6-tutorial---putinput-and-getresult) : tutorial for using PutInput and GetOutput
4647
- [7. Tutorial - Writing tests](#7-tutorial---writing-tests) : Tutorial on running tests
4748
- [8. Tutorial - Debugging](#8-tutorial---debugging) : Tutorial on debugging and printing
48-
- [9. Running a model from your favorite deep learning framework](#9-running-a-model-from-your-favorite-deep-learning-framework) : Tutorial on converting models to ONNX
49+
- [9. Variable Fix Point Quantization](#9-variable-fix-point-quantization) : Tutorial on using variable fix-point
50+
- [10. Running a model from your favorite deep learning framework](#10-running-a-model-from-your-favorite-deep-learning-framework) : Tutorial on converting models to ONNX
4951
* [Tensorflow](#tensorflow)
5052
* [Caffe1](#caffe1)
5153
* [Keras](#keras)
52-
- [10. Supported models and layers](#10-supported-models-and-layers) : List of supported layers and models tested on the DLA
54+
- [11. Supported models and layers](#11-supported-models-and-layers) : List of supported layers and models tested on the DLA
5355
* [Tested models](#tested-models)
5456
* [TF-Slim models tested on Micron DLA inference engine](#tf-slim-models-tested-on-microndla-inference-engine)
5557
* [ONNX model zoo](#onnx-model-zoo)
5658
* [Keras](#keras)
5759
* [CNTK](#cntk)
58-
- [11. Troubleshooting and Q&A](#11-troubleshooting-and-qa) : Troubleshooting common issues and answering common questions
60+
- [12. Troubleshooting and Q&A](#12-troubleshooting-and-qa) : Troubleshooting common issues and answering common questions
5961

6062

6163
# 1. Installation
@@ -282,6 +284,22 @@ For more information about onnx please visit [https://onnx.ai/](https://onnx.ai/
282284

283285
To convert tensorflow models into ONNX files please reference the section [6. Using with Tensorflow](#6-using-with-tensorflow)
284286

287+
**Loading hardware into FPGA**
288+
289+
When you turn on the system, it will have the FPGA programmed with a default hardware definition. You need to load the MDLA bitfile only once after turning on the system.
290+
291+
You can load a MDLA bitfile of choice using:
292+
293+
`python3 loadbitfile.py <bitfile path>`
294+
295+
You can find the MDLA bitfiles in the pico-computing folder:
296+
297+
`/usr/src/picocomputing`
298+
299+
Loading the FPGA will take at max 5 min.
300+
Loading the FPGA only fails when there are no FPGA cards available. If you find issues in loading FPGA check out [Troubleshooting](#11-troubleshooting-and-qa).
301+
Micron DLA hardware will be loaded in the FPGA card. The following MDLA runs will not need to load the hardware anymore.
302+
285303
**Running inference on Micron DLA hardware for one image**
286304

287305
In the SDK folder, there is simpledemo.py, which is a python demo application.
@@ -296,16 +314,7 @@ Its main parts are:
296314
The user may modify steps 1 and 5 according to users needs.
297315
Check out other possible application programs using Micron DLA hardware [here](http://fwdnxt.com/).
298316
The example program is located in examples/python/
299-
You can run the demo using this command:
300-
301-
`python3 simpledemo.py <onnx file> <picture> -c <categories file.txt> -l <bitfile.bit>`
302-
303-
`-l` option will load the hardware into a FPGA card.
304317

305-
306-
Loading the FPGA and bringing up the HMC will take at max 5 min.
307-
Loading the FPGA only fails when there are no FPGA cards available. If you find issues in loading FPGA check out [Troubleshooting](#11-troubleshooting-and-qa).
308-
After the first run, Micron DLA hardware will be loaded in the FPGA card. The following runs will not need to load the hardware anymore.
309318
You can run the network on hardware with this command, which will find the FPGA card that was loaded with Micron DLA hardware:
310319

311320
`python3 simpledemo.py <onnx file> <picture> -c <categories file.txt>`
@@ -346,6 +355,9 @@ The main functions are:
346355
3) `ie_run`: load input image and execute on the DLA.
347356

348357
Check out other possible application programs using the DLA [here](http://fwdnxt.com/).
358+
359+
Make sure the MDLA bitfile was loaded into the FPGA before running it.
360+
349361
To run the demo, first run the following commands:
350362

351363
```
@@ -354,15 +366,7 @@ make
354366
./compile -m <model.onnx> -i 224x224x3 -o instructions.bin
355367
```
356368
Where `-i` is the input sizes: width x height x channels.
357-
After creating the `instructions.bin`, you can run the following command to execute it:
358-
359-
`./simpledemo -i <picturefile> -c <categoriesfile> -s ./instructions.bin -b <bitfile.bit>`
360-
361-
`-b` option will load the specified DLA bitfile into a FPGA card.
362-
Loading the FPGA and bringing up the HMC will take a maximum of five minutes.
363-
Loading the FPGA only fails when there are no FPGA cards available. If you find issues in loading FPGA check out [Troubleshooting](#11-troubleshooting-and-qa).
364-
After the first run, the DLA will be loaded in the FPGA card. The following runs will not need to load the DLA bitfile anymore.
365-
You can run the network on the DLA with this command, which will find the FPGA card that was loaded with the DLA:
369+
After creating the `instructions.bin`, you can run the network on the DLA with this command, which will find the FPGA card that was loaded with the DLA:
366370

367371
`./simpledemo -i <picturefile> -c <categoriesfile> -s ./instructions.bin`
368372

@@ -752,7 +756,52 @@ python3 test_model.py resnet18.onnx 224x224x3
752756

753757
The code also have a profile option, which will execute each layer of the model and print the time measurements into a `.csv` file.
754758

755-
# 9. Running a model from your favorite deep learning framework
759+
# 9. Variable Fix Point Quantization
760+
761+
Micro DLA uses 16-bit fix point to represent numbers. The `Compile` function will convert the numbers in the onnx model from float 32-bit into 16-bit fix-point [Q8.8](https://en.wikipedia.org/wiki/Q_(number_format)). The default Micron DLA bitfile will run the model using Q8.8.
762+
763+
A Micron DLA bitfile with variable fix-point support is provided in order to reduce the discrepancy between the float 32-bit and the Q8.8 representation.
764+
765+
This bitfile allows the software to choose different QX.Y representations that is the best fit for different parts of the neural network model.
766+
767+
The SDK provides 2 options for variable fix-point quantization. **Before you try** these options, make sure to load the bitfile that supports variable fix-point into the FPGA.
768+
769+
**Option 1**: For each layer of the model, their weights and biases are converted into different QX.Y representations.
770+
771+
In this case, you can set 'V' in the options using `SetFlag` function before `Compile`:
772+
773+
```python
774+
ie = microndla.MDLA()
775+
ie.SetFlag('options', 'V')
776+
#Compile to a file
777+
swnresults = ie.Compile('224x224x3', 'resnet18.onnx', 'save.bin')
778+
```
779+
780+
**Option 2**: Variable fix-point can be determined for input and output of each layer if one or more sample inputs are provided.
781+
782+
You will need to provide a set of sample inputs (calibration data) to `Quantize` funtion. In addition to compiling the model, `Quantize` will run the model with the calibration inputs using float32 and save the variable fix-point configuration for each input/output of each layer in the model. `Quantize` will also convert the static data (weights and biases) to the appropriate fix-point representation, so no need for `ie.SetFlag('options', 'V')` in this case.
783+
784+
Instead of using ie.Compile, you use `Quantize` and give an array of input data:
785+
786+
```python
787+
#Load image into a numpy array
788+
img = LoadImage(args.image, args)
789+
imgs = []
790+
for fn in os.listdir(args.imagesdir):
791+
x = LoadImage(args.imagesdir + '/' + fn, args)
792+
imgs.append(x)
793+
794+
#Create and initialize the Inference Engine object
795+
ie = microndla.MDLA()
796+
#Compile to a file
797+
swnresults = ie.Quantize('224x224x3', 'resnet18.onnx', 'save.bin', imgs)
798+
```
799+
800+
After that, `Init` and `Run` runs as usual using the saved variable fix-point configuration.
801+
802+
Checkout the example [quantize.py](examples/python/quantize.py) which takes same arguments as `simpledemo.py`. The only addition is a folder with the calibration input images for calibration data.
803+
804+
# 10. Running a model from your favorite deep learning framework
756805

757806
Micron DLA supports different deep learning frameworks by running models in ONNX format. In order to convert a model from your favorite deep learning framework to ONNX format you should follow the instructions [here](https://github.com/onnx/tutorials). However there are some extra steps you should take with certain frameworks for the best compatibility with Micron DLA and we describe them below.
758807

@@ -847,7 +896,7 @@ onnx_model = onnxmltools.convert_keras(model)
847896
onnx.save(onnx_model, 'resnet50.onnx')
848897
```
849898

850-
# 10. Supported models and layers
899+
# 11. Supported models and layers
851900

852901
* [Add](examples/tests/test_vectoradd.py)
853902
* AveragePool
@@ -868,7 +917,6 @@ onnx.save(onnx_model, 'resnet50.onnx')
868917
* Upsample
869918

870919
## Tested models
871-
These models are available [here](http://fwdnxt.com/models/).
872920

873921
* Alexnet OWT (versions without LRN)
874922
* Resnet 18, 34, 50
@@ -877,6 +925,12 @@ These models are available [here](http://fwdnxt.com/models/).
877925
* [LightCNN-9](https://arxiv.org/pdf/1511.02683.pdf)
878926
* [Linknet](https://arxiv.org/pdf/1707.03718.pdf)
879927
* [Neural Style Transfer Network](https://arxiv.org/pdf/1603.08155.pdf)
928+
* LSTM
929+
* GRU
930+
* [Mobilenet](https://github.com/onnx/models/blob/master/vision/classification/mobilenet/model/mobilenetv2-7.onnx)
931+
* [Linknet](https://arxiv.org/abs/1707.03718)
932+
* [Enet](https://arxiv.org/abs/1606.02147)
933+
* [SqueezeNet](https://arxiv.org/abs/1602.07360)
880934

881935
## TF-Slim models tested on Micron DLA inference engine
882936

@@ -907,7 +961,7 @@ Note: BVLC models, Inception_v1, ZFNet512 are not supported because we do not su
907961
* [ResNet50](https://www.cntk.ai/Models/CNTK_Pretrained/ResNet50_ImageNet_CNTK.model)
908962
* [VGG16](https://www.cntk.ai/Models/Caffe_Converted/VGG16_ImageNet_Caffe.model)
909963

910-
# 11. Troubleshooting and Q&A
964+
# 12. Troubleshooting and Q&A
911965

912966
Q: Where can I find weights for pretrained TF-slim models?
913967

examples/python/quantize.py

+85
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
#!/usr/bin/python3
2+
3+
import sys
4+
sys.path.insert(0, '../../')
5+
import microndla
6+
import sys
7+
import os
8+
import PIL
9+
from PIL import Image
10+
import numpy as np
11+
12+
from argparse import ArgumentParser
13+
# argument Checking
14+
parser = ArgumentParser(description="Micron DLA Categorization Demonstration using Quantize")
15+
_ = parser.add_argument
16+
_('modelpath', type=str, default='', help='Path to the model file')
17+
_('image', type=str, default='', help='An image file used as input')
18+
_('imagesdir', type=str, default='', help='Directory with images for quantization calibration')
19+
_('-r', '--res', type=int, default=[3, 224, 224], nargs='+', help='expected image size (planes, height, width)')
20+
_('-c', '--categories', type=str, default='', help='Categories file')
21+
_('-l','--load', type=str, default='', help='Load bitfile')
22+
23+
def LoadImage(imagepath, args):
24+
25+
#Load image into a numpy array
26+
img = Image.open(imagepath)
27+
28+
#Resize it to the size expected by the network
29+
img = img.resize((args.res[2], args.res[1]), resample=PIL.Image.BILINEAR)
30+
31+
#Convert to numpy float
32+
img = np.array(img).astype(np.float32) / 255
33+
34+
#Transpose to plane-major, as required by our API
35+
img = np.ascontiguousarray(img.transpose(2,0,1))
36+
37+
#Normalize images
38+
stat_mean = list([0.485, 0.456, 0.406])
39+
stat_std = list([0.229, 0.224, 0.225])
40+
for i in range(3):
41+
img[i] = (img[i] - stat_mean[i])/stat_std[i]
42+
43+
return img
44+
45+
args = parser.parse_args()
46+
47+
#Load image into a numpy array
48+
img = LoadImage(args.image, args)
49+
imgs = []
50+
for fn in os.listdir(args.imagesdir):
51+
x = LoadImage(args.imagesdir + '/' + fn, args)
52+
imgs.append(x)
53+
54+
#Create and initialize the Inference Engine object
55+
ie = microndla.MDLA()
56+
#ie.SetFlag('debug','bw')
57+
58+
#Compile to a file
59+
istr = "{:d}x{:d}x{:d}".format(args.res[2], args.res[1], args.res[0])
60+
swnresults = ie.Quantize(istr, args.modelpath, 'save.bin', imgs)
61+
62+
#Init fpga
63+
nresults = ie.Init('save.bin', args.load)
64+
65+
#Create the storage for the result and run one inference
66+
result = np.ndarray(swnresults,dtype=np.float32)
67+
ie.Run_sw(img, result)
68+
69+
#Convert to numpy and print top-5
70+
idxs = (-result).argsort()
71+
72+
print('')
73+
print('-------------- Results --------------')
74+
if args.categories != '':
75+
with open(args.categories) as f:
76+
categories = f.read().splitlines()
77+
for i in range(5):
78+
print(categories[idxs[i]], result[idxs[i]])
79+
else:
80+
for i in range(5):
81+
print(idxs[i], result[idxs[i]])
82+
83+
#Free
84+
ie.Free()
85+
print('done')

examples/pytorch-ssd/predictor.py

+121
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
import sys
2+
import microndla
3+
import numpy as np
4+
import torch
5+
import torch.nn.functional as F
6+
import torch.onnx as onnx
7+
8+
from ..utils import box_utils
9+
from .data_preprocessing import PredictionTransform
10+
from ..utils.misc import Timer
11+
12+
13+
class Predictor:
14+
def __init__(self, net, size, mean=0.0, std=1.0, nms_method=None,
15+
iou_threshold=0.45, filter_threshold=0.01, candidate_size=200, sigma=0.5, device=None):
16+
self.net = net
17+
self.transform = PredictionTransform(size, mean, std)
18+
self.iou_threshold = iou_threshold
19+
self.filter_threshold = filter_threshold
20+
self.candidate_size = candidate_size
21+
self.nms_method = nms_method
22+
23+
self.sigma = sigma
24+
if device:
25+
self.device = device
26+
else:
27+
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
28+
29+
self.net.to(self.device)
30+
self.net.eval()
31+
32+
self.timer = Timer()
33+
34+
#Micron DLA
35+
isize = self.net.config.image_size
36+
image = torch.ones([1, 3, isize, isize]).to(self.device)
37+
onnx.export(net, image, 'ssd.onnx')
38+
self.ie = microndla.MDLA()
39+
self.swnresults = self.ie.Compile("{:d}x{:d}x{:d}".format(isize, isize, 3), 'ssd.onnx', 'ssd.bin')
40+
bitfile = ''
41+
self.ie.Init('ssd.bin', bitfile)
42+
self.result = []
43+
for i in self.swnresults:
44+
self.result.append(np.ascontiguousarray(np.zeros(i, dtype=np.float32)))
45+
46+
def predict(self, image, top_k=-1, prob_threshold=None):
47+
cpu_device = torch.device("cpu")
48+
height, width, _ = image.shape
49+
image = self.transform(image)
50+
images = image.unsqueeze(0)
51+
images = images.to(self.device)
52+
with torch.no_grad():
53+
self.timer.start()
54+
scores = []
55+
boxes = []
56+
scores_1, boxes_1 = self.net.forward(images)
57+
58+
img = np.ascontiguousarray(images.cpu().numpy())
59+
self.ie.Run(img, self.result)
60+
print("Inference time: ", self.timer.end())
61+
#microndla
62+
self.timer.start()
63+
for i in range(0, len(self.result), 2):
64+
s = self.result[i].reshape(1,126,-1)#TODO: microndla API should return the output shape
65+
isz = np.sqrt(s.shape[2])
66+
s = s.reshape(1, 126, int(isz), int(isz))
67+
s = torch.from_numpy(s).float().to(self.device)
68+
s = s.permute(0, 2, 3, 1).contiguous()
69+
s = s.view(s.size(0), -1, self.net.num_classes)
70+
scores.append(s)
71+
72+
b = self.result[i+1].reshape(1,24,-1)
73+
isz = np.sqrt(b.shape[2])
74+
b = b.reshape(1, 24, int(isz), int(isz))
75+
b = torch.from_numpy(b).float().to(self.device)
76+
b = b.permute(0, 2, 3, 1).contiguous()
77+
b = b.view(b.size(0), -1, 4)
78+
boxes.append(b)
79+
80+
confidences = torch.cat(scores, 1)
81+
locations = torch.cat(boxes, 1)
82+
scores = F.softmax(confidences, dim=2)
83+
boxes = box_utils.convert_locations_to_boxes(
84+
locations, self.net.priors, self.net.config.center_variance, self.net.config.size_variance
85+
)
86+
boxes = box_utils.center_form_to_corner_form(boxes)
87+
88+
print("Post-processing time: ", self.timer.end())
89+
boxes = boxes[0]
90+
scores = scores[0]
91+
if not prob_threshold:
92+
prob_threshold = self.filter_threshold
93+
# this version of nms is slower on GPU, so we move data to CPU.
94+
boxes = boxes.to(cpu_device)
95+
scores = scores.to(cpu_device)
96+
picked_box_probs = []
97+
picked_labels = []
98+
for class_index in range(1, scores.size(1)):
99+
probs = scores[:, class_index]
100+
mask = probs > prob_threshold
101+
probs = probs[mask]
102+
if probs.size(0) == 0:
103+
continue
104+
subset_boxes = boxes[mask, :]
105+
box_probs = torch.cat([subset_boxes, probs.reshape(-1, 1)], dim=1)
106+
box_probs = box_utils.nms(box_probs, self.nms_method,
107+
score_threshold=prob_threshold,
108+
iou_threshold=self.iou_threshold,
109+
sigma=self.sigma,
110+
top_k=top_k,
111+
candidate_size=self.candidate_size)
112+
picked_box_probs.append(box_probs)
113+
picked_labels.extend([class_index] * box_probs.size(0))
114+
if not picked_box_probs:
115+
return torch.tensor([]), torch.tensor([]), torch.tensor([])
116+
picked_box_probs = torch.cat(picked_box_probs)
117+
picked_box_probs[:, 0] *= width
118+
picked_box_probs[:, 1] *= height
119+
picked_box_probs[:, 2] *= width
120+
picked_box_probs[:, 3] *= height
121+
return picked_box_probs[:, :4], torch.tensor(picked_labels), picked_box_probs[:, 4]

0 commit comments

Comments
 (0)