Skip to content
This repository was archived by the owner on Oct 21, 2023. It is now read-only.

Commit f980919

Browse files
authored
Merge pull request #13 from FWDNXT/2021.2
2021.2
2 parents 19d302a + 7d9eb55 commit f980919

39 files changed

+486
-216
lines changed

README.md

+93-45
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ This SDK folder contains:
3535
* [System requirements](#system-requirements)
3636
* [Pico computing](#pico-computing)
3737
* [Docker Image](#docker-image)
38-
* [Manual Installation](#manual-installation)
38+
* [Python package Install](#python-package-install)
3939
- [2. Getting started with Deep Learning](#2-getting-started-with-deep-learning) : general information about deep learning
4040
* [Introduction](#introduction)
4141
* [PyTorch: Deep Learning framework](#pytorch-deep-learning-framework)
@@ -49,6 +49,10 @@ This SDK folder contains:
4949
* [Multiple FPGAs with different models <a name="two"></a>](#multiple-fpgas-with-different-models)
5050
* [Multiple Clusters with input batching <a name="three"></a>](#multiple-clusters-with-input-batching)
5151
* [Multiple Clusters without input batching <a name="four"></a>](#multiple-clusters-without-input-batching)
52+
* [Multiple Clusters with different models <a name="five"></a>](#multiple-clusters-with-different-models)
53+
* [All Clusters with different models in sequence <a name="six"></a>](#all-clusters-with-different-models-in-sequence)
54+
* [Multiple Clusters with even bigger batches <a name="seven"></a>](#multiple-clusters-with-even-bigger-batches)
55+
* [Batching using MVs <a name="four"></a>](#batching-using-mvs)
5256
- [6. Tutorial - PutInput and GetResult](#6-tutorial---putinput-and-getresult) : tutorial for using PutInput and GetOutput
5357
- [7. Tutorial - Writing tests](#7-tutorial---writing-tests) : Tutorial on running tests
5458
- [8. Tutorial - Debugging](#8-tutorial---debugging) : Tutorial on debugging and printing
@@ -114,8 +118,12 @@ lspci | grep -i pico
114118
lsmod | grep -i pico
115119
pico 3493888 12
116120
```
121+
After installing pico-computing, run install.sh to install the MDLA SDK
117122

118-
## Docker Image (optional)
123+
124+
## Docker Image
125+
126+
This step is optinal if you want to run as a docker image
119127

120128
If you want to use MDLA with docker, then you need to install [pico-computing](#pico-computing) and [docker](https://docs.docker.com/get-docker/).
121129

@@ -176,6 +184,17 @@ root@d80174ce2995:/home/mdla#
176184

177185
Run the example code provided. Check sections [3](#3-getting-started-inference-on-micron-dla-hardware) and [4](#4-getting-started-inference-on-micron-dla-hardware-with-c).
178186

187+
## Python package Install (optional)
188+
189+
You can also install as a python package
190+
191+
`git clone https://github.com/FWDNXT/SDK`
192+
193+
Then inside SDK folder do
194+
195+
`python3 setup.py install --user`
196+
197+
179198
# 2. Getting started with Deep Learning
180199

181200
## Introduction
@@ -355,20 +374,16 @@ numclus = 1
355374
# Create Micron DLA API
356375
sf = microndla.MDLA()
357376
# Generate instructions
358-
sf.SetFlag('nfpgas', str(numfpga))
359-
sf.SetFlag('nclusters', str(numclus))
360-
sf.Compile('resnet18.onnx', 'microndla.bin')
361-
# Init the FPGA cards
362-
sf.Init('microndla.bin')
377+
sf.SetFlag({'nfpgas': str(numfpga), 'nclusters': str(numclus)})
378+
sf.Compile('resnet18.onnx')
363379
in1 = np.random.rand(2, 3, 224, 224).astype(np.float32)
364380
input_img = np.ascontiguousarray(in1)
365381
# Create a location for the output
366382
output = sf.Run(input_img)
367383
```
368384

369-
`sf.Compile` will parse the model from model.onnx and save the generated Micron DLA instructions in microndla.bin. Here numfpga=2, so instructions for two FPGAs are created.
385+
`sf.Compile` will parse the model from model.onnx and save the generated Micron DLA instructions. Here numfpga=2, so instructions for two FPGAs are created.
370386
`nresults` is the output size of the model.onnx for one input image (no batching).
371-
`sf.Init` will initialize the FPGAs. It will send the instructions and model parameters to each FPGA's main memory.
372387
The expected output size of `sf.Run` is twice `nresults`, because numfpga=2 and two input images are processed. `input_img` is two images concatenated.
373388
The diagram below shows this type of execution:
374389

@@ -387,13 +402,9 @@ sf1 = microndla.MDLA()
387402
# Create second Micron DLA API
388403
sf2 = microndla.MDLA()
389404
# Generate instructions for model1
390-
sf1.Compile('resnet50.onnx', 'microndla1.bin')
405+
sf1.Compile('resnet50.onnx')
391406
# Generate instructions for model2
392-
sf2.Compile('resnet18.onnx', 'microndla2.bin')
393-
# Init the FPGA 1 with model 1
394-
sf1.Init('microndla1.bin')
395-
# Init the FPGA 2 with model 2
396-
sf2.Init('microndla2.bin')
407+
sf2.Compile('resnet18.onnx')
397408
in1 = np.random.rand(3, 224, 224).astype(np.float32)
398409
in2 = np.random.rand(3, 224, 224).astype(np.float32)
399410
input_img1 = np.ascontiguousarray(in1)
@@ -423,9 +434,7 @@ numclus = 2
423434
sf = microndla.MDLA()
424435
# Generate instructions
425436
sf.SetFlag('nclusters', str(numclus))
426-
sf.Compile('resnet18.onnx', 'microndla.bin')
427-
# Init the FPGA cards
428-
sf.Init('microndla.bin')
437+
sf.Compile('resnet18.onnx')
429438
in1 = np.random.rand(2, 3, 224, 224).astype(np.float32)
430439
input_img = np.ascontiguousarray(in1)
431440
output = sf.Run(input_img)
@@ -447,12 +456,9 @@ numfpga = 1
447456
numclus = 2
448457
# Create Micron DLA API
449458
sf = microndla.MDLA()
450-
sf.SetFlag('nclusters', str(numclus))
451-
self.dla.SetFlag('clustersbatchmode', '1')
459+
sf.SetFlag({'nclusters': str(numclus), 'clustersbatchmode': '1'})
452460
# Generate instructions
453-
sf.Compile('resnet18.onnx', 'microndla.bin')
454-
# Init the FPGA cards
455-
sf.Init('microndla.bin')
461+
sf.Compile('resnet18.onnx')
456462
in1 = np.random.rand(3, 224, 224).astype(np.float32)
457463
input_img = np.ascontiguousarray(in1)
458464
output = sf.Run(input_img)
@@ -465,6 +471,60 @@ The diagram below shows this type of execution:
465471

466472
<img src="docs/pics/2clus1img.png" width="600" height="550"/>
467473

474+
## Multiple Clusters with different models
475+
The following example shows how to run different models using different clusters in parallel.
476+
Currently, a cluster for each model is allowed. But different number of cluster per model is not allowed. For example, 3 clusters for a model and then 1 cluster for another.
477+
The example code is in [here](./examples/python_api/twonetdemo.py)
478+
479+
```python
480+
import microndla
481+
import numpy as np
482+
nclus = 2
483+
img0 = np.random.rand(3, 224, 224).astype(np.float32)
484+
img1 = np.random.rand(3, 224, 224).astype(np.float32)
485+
ie = microndla.MDLA()
486+
ie2 = microndla.MDLA()
487+
ie.SetFlag({'nclusters': nclus, 'clustersbatchmode': 1})
488+
ie2.SetFlag({'nclusters': nclus, 'firstcluster': nclus, 'clustersbatchmode': 1})
489+
ie.Compile('resnet18.onnx')
490+
ie2.Compile('alexnet.onnx', MDLA=ie)
491+
ie.PutInput(img0, None)
492+
ie2.PutInput(img1, None)
493+
result0, _ = ie.GetResult()
494+
result1, _ = ie2.GetResult()
495+
```
496+
In the code, you create one MDLA object for each model and compile them. For the first model, use 2 clusters together.
497+
For the second model, assign the remaining 2 clusters to it. Use `firstcluster` flag to tell `Compile` which cluster is the first cluster it is going to use.
498+
In this example, first model uses clusters 0 and 1 and second model uses clusters 2 and 3.
499+
In `Compile`, pass the previous MDLA object to link them together so that they get loaded into memory in one go.
500+
In this case, you must use `PutInput` and `GetResult` paradigm (this [section](#6-tutorial---putinput-and-getresult)), you cannot use `Run`.
501+
502+
<img src="docs/pics/2clus2model.png" width="600" height="550"/>
503+
504+
## All Clusters with different models in sequence
505+
506+
This example shows how to load multiple models and run them in a sequence using all clusters. This is similar to previous example, the only different
507+
all clusters is used for each model. It uses same principle of creating different MDLA objects for each model and link different MDLAs in `Compile`.
508+
509+
```python
510+
import microndla
511+
import numpy as np
512+
nclus = 2
513+
img0 = np.random.rand(3, 224, 224).astype(np.float32)
514+
img1 = np.random.rand(3, 224, 224).astype(np.float32)
515+
ie = microndla.MDLA()
516+
ie2 = microndla.MDLA()
517+
ie.SetFlag({'nclusters': nclus, 'clustersbatchmode': 1})
518+
ie2.SetFlag({'nclusters': nclus, 'clustersbatchmode': 1})
519+
ie.Compile('resnet18.onnx')
520+
ie2.Compile('alexnet.onnx', MDLA=ie)
521+
result0 = ie.Run(img0)
522+
result1 = ie2.Run(img1)
523+
```
524+
525+
<img src="docs/pics/2clus2seqmodel.png" width="600" height="550"/>
526+
527+
468528
## Multiple Clusters with even bigger batches
469529

470530
It's possible to run batches of more than than the number of clusters or FPGAs. Each cluster will process multiple images.
@@ -477,12 +537,9 @@ numfpga = 1
477537
numclus = 2
478538
# Create Micron DLA API
479539
sf = microndla.MDLA()
480-
sf.SetFlag('nclusters', str(numclus))
481-
sf.SetFlag('imgs_per_cluster', '16')
540+
sf.SetFlag({'nclusters': str(numclus), 'imgs_per_cluster': '16'})
482541
# Generate instructions
483-
sf.Compile('resnet18.onnx', 'microndla.bin')
484-
# Init the FPGA cards
485-
sf.Init('microndla.bin')
542+
sf.Compile('resnet18.onnx')
486543
in1 = np.random.rand(32, 3, 224, 224).astype(np.float32)
487544
input_img = np.ascontiguousarray(in1)
488545
output = sf.Run(input_img) # Run
@@ -502,13 +559,9 @@ numfpga = 1
502559
numclus = 2
503560
# Create Micron DLA API
504561
sf = microndla.MDLA()
505-
sf.SetFlag('nclusters', str(numclus))
506-
sf.SetFlag('imgs_per_cluster', '16')
507-
sf.SetFlag('mvbatch', '1')
562+
sf.SetFlag({'nclusters': str(numclus), 'imgs_per_cluster': '16', 'mvbatch': '1'})
508563
# Generate instructions
509-
sf.Compile('resnet18.onnx', 'microndla.bin')
510-
# Init the FPGA cards
511-
sf.Init('microndla.bin')
564+
sf.Compile('resnet18.onnx')
512565
in1 = np.random.rand(32, 3, 224, 224).astype(np.float32)
513566
input_img = np.ascontiguousarray(in1)
514567
output = sf.Run(input_img)
@@ -594,8 +647,7 @@ result_pyt = result_pyt.detach().numpy()
594647

595648
Now we need to run this model using the accelerator with the SDK.
596649
```python
597-
sf.Compile('net_conv.onnx', 'net_conv.bin')
598-
sf.Init("./net_conv.bin")
650+
sf.Compile('net_conv.onnx')
599651
in_1 = np.ascontiguousarray(inV)
600652
result = sf.Run(in_1)
601653
```
@@ -630,9 +682,7 @@ A debug option won't affect the compiler, it will only print more information. T
630682

631683
You can use `SetFlag('debug', 'b')` to print the basic prints. The debug code `'b'` stands for basic. Debug codes and option codes are letters (case-sensetive). For a complete list of letters refer to [here](docs/Codes.md).
632684

633-
Always put the `SetFlag()` after creating the Micron DLA object. If will print the information about the run. First, it will list all the layers that it is going to compile from the `net_conv.onnx` and produce a `net_conv.bin`.
634-
635-
Then `Init` will find an FPGA system, AC511 in our case. It will also show how much time it took to send the weights and instructions to the external memory in the `Init` function.
685+
Always put the `SetFlag()` after creating the Micron DLA object. If will print the information about the run. First, it will list all the layers that it is going to compile from the `net_conv.onnx`.
636686

637687
Then `Run` will rearrange in the input tensor and load it into the external memory. It will print the time it took and other properties of the run, such as number of FPGAs and clusters used.
638688

@@ -706,7 +756,7 @@ In this case, you can set 'V' in the options using `SetFlag` function before `Co
706756
ie = microndla.MDLA()
707757
ie.SetFlag('varfp', '1')
708758
#Compile to a file
709-
swnresults = ie.Compile('resnet18.onnx', 'save.bin')
759+
swnresults = ie.Compile('resnet18.onnx')
710760
```
711761

712762
**Option 2**: Variable fix-point can be determined for input and output of each layer if one or more sample inputs are provided.
@@ -726,7 +776,7 @@ for fn in os.listdir(args.imagesdir):
726776
#Create and initialize the Inference Engine object
727777
ie = microndla.MDLA()
728778
#Compile to a file
729-
swnresults = ie.Compile('resnet18.onnx', 'save.bin', samples=imgs)
779+
swnresults = ie.Compile('resnet18.onnx', samples=imgs)
730780
```
731781

732782
After that, `Init` and `Run` runs as usual using the saved variable fix-point configuration.
@@ -784,11 +834,9 @@ mnist = tf.keras.datasets.mnist
784834
x_train, x_test = x_train / 255.0, x_test / 255.0
785835
786836
ie = microndla.MDLA()
787-
swnresults = ie.Compile('28x28x1', 'mnist', 'save.bin')
788-
ie.Init('save.bin', '')
789-
result = np.ndarray(swnresults, dtype=np.float32)
837+
ie.Compile('mnist.onnx')
790838
for i in range(0, 10):
791-
ie.Run(x_test[i].astype(np.float32), result)
839+
result = ie.Run(x_test[i].astype(np.float32))
792840
print(y_test[i], np.argmax(result))
793841
794842
```

api.h

+4-4
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
#ifndef _IE_API_H_INCLUDED_
1212
#define _IE_API_H_INCLUDED_
1313

14-
static const char *microndla_version = "2021.1.0";
14+
static const char *microndla_version = "2021.2.0";
1515
#include <stdint.h>
1616
#include <stdio.h>
1717
#include <stdlib.h>
@@ -46,7 +46,7 @@ int IECOMPILER_API set_external_wait(void *cmemo, bool (*wait_ext) (int));
4646
/*!
4747
Allow to pass externally created thnets net into node list
4848
*/
49-
void IECOMPILER_API ext_thnets2lst(void *cmemo, void* nett, char* image, int limit, int batch);
49+
void IECOMPILER_API ext_thnets2lst(void *cmemo, void* nett, char* image, int batch);
5050

5151
/*!
5252
Create an Inference Engine object
@@ -82,7 +82,7 @@ Run static quantization of inputs, weight and outputs over a calibration dataset
8282
*/
8383
void IECOMPILER_API *ie_compile_vfp(void *cmemo, const char *modelpath, const char* outbin, const char *inshapes,
8484
unsigned *noutputs, unsigned **noutdims, uint64_t ***outshapes,
85-
const float * const *inputs, const uint64_t *input_elements, unsigned ninputs);
85+
const float * const *inputs, const uint64_t *input_elements, unsigned ninputs, void *cmemp);
8686

8787
/*!
8888
Compile a network and produce a .bin file with everything that is needed to execute in hardware.
@@ -97,7 +97,7 @@ In this case, ie_compile is necessary, ie_init with a previously generated bin f
9797
@param outshapes returns a pointer to noutputs pointers to the shapes of each output
9898
@return context object
9999
*/
100-
void IECOMPILER_API *ie_compile(void *cmemo, const char *modelpath, const char *outbin, const char *inshapes, unsigned *noutputs, unsigned **noutdims, uint64_t ***outshapes);
100+
void IECOMPILER_API *ie_compile(void *cmemo, const char *modelpath, const char *outbin, const char *inshapes, unsigned *noutputs, unsigned **noutdims, uint64_t ***outshapes, void *cmemp);
101101
/*!
102102
Load a .bin file into the hardware and initialize it
103103
@param cmemo pointer to an Inference Engine object, may be null

docs/C_API.md

+9-17
Original file line numberDiff line numberDiff line change
@@ -37,15 +37,18 @@ Frees the network
3737
******
3838
## void *ie_compile
3939

40-
Parse an ONNX model and generate Inference Engine instructions
40+
Parse an ONNX/NNEF model and generate Inference Engine instructions
4141

4242
***Parameters:***
4343

44+
void IECOMP char *modelpath, const char *outbin, const char *inshapes, unsigned *noutputs, unsigned **noutdims, uint64_t ***outshapes, void *cmemp);
45+
46+
4447
`void *cmemo`: pointer to an Inference Engine object, may be 0
4548

4649
`const char *modelpath`: path to a model file in ONNX format
4750

48-
`const char* outbin`: path to a file where a model in the Inference Engine ready format will be saved
51+
`const char* outbin`: path to a file where a model in the Inference Engine ready format will be saved. If this param is used then Init call is needed afterwards
4952

5053
`const char *inshapes`: shape of the inputs in the form size0xsize1xsize2...; more inputs are separated by semi-colon; this parameter is optional as the shapes of the inputs can be obtained from the model file
5154

@@ -55,6 +58,8 @@ Parse an ONNX model and generate Inference Engine instructions
5558

5659
`uint64_t ***outshapes`: returns a pointer to noutputs pointers to the shapes of each output
5760

61+
`void *cmemp`: MDLA object to link together so that models can be load into memory together
62+
5863
***Return value:*** pointer to the Inference Engine object or 0 in case of error
5964

6065
******
@@ -85,22 +90,9 @@ choosing the proper quantization for variable-fixed point, available with the VF
8590

8691
`unsigned ninputs`: number of inputs, must be a multiple of the inputs expected by the network
8792

88-
***Return value:*** pointer to the Inference Engine object or 0 in case of error
89-
90-
******
91-
## void *ie_loadmulti
92-
93-
Loads multiple bitfiles without initializing hardware
93+
`void *cmemp`: MDLA object to link together so that models can be load into memory together
9494

95-
***Parameters:***
96-
97-
`void *cmemo`: pointer to an Inference Engine object
98-
99-
`const char* const *inbins`: array of pathnames to the bitfiles to load
100-
101-
`unsigned count`: number of bitfiles to load
102-
103-
***Return value:*** pointer to an Inference Engine object to pass to ie_init
95+
***Return value:*** pointer to the Inference Engine object or 0 in case of error
10496

10597
******
10698
## void *ie_init

docs/Codes.md

+1
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,7 @@ following characters:
147147

148148
**no_rearrange**: Skip output rearrangement
149149

150+
**heterogeneous**: Run DLA-unsupported layers on CPU also in the middle of the network
150151

151152
*****
152153
## GetInfo

0 commit comments

Comments
 (0)