You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 21, 2023. It is now read-only.
Run the example code provided. Check sections [3](#3-getting-started-inference-on-micron-dla-hardware) and [4](#4-getting-started-inference-on-micron-dla-hardware-with-c).
`sf.Compile` will parse the model from model.onnx and save the generated Micron DLA instructions in microndla.bin. Here numfpga=2, so instructions for two FPGAs are created.
385
+
`sf.Compile` will parse the model from model.onnx and save the generated Micron DLA instructions. Here numfpga=2, so instructions for two FPGAs are created.
370
386
`nresults` is the output size of the model.onnx for one input image (no batching).
371
-
`sf.Init` will initialize the FPGAs. It will send the instructions and model parameters to each FPGA's main memory.
372
387
The expected output size of `sf.Run` is twice `nresults`, because numfpga=2 and two input images are processed. `input_img` is two images concatenated.
The following example shows how to run different models using different clusters in parallel.
476
+
Currently, a cluster for each model is allowed. But different number of cluster per model is not allowed. For example, 3 clusters for a model and then 1 cluster for another.
477
+
The example code is in [here](./examples/python_api/twonetdemo.py)
In the code, you create one MDLA object for each model and compile them. For the first model, use 2 clusters together.
497
+
For the second model, assign the remaining 2 clusters to it. Use `firstcluster` flag to tell `Compile` which cluster is the first cluster it is going to use.
498
+
In this example, first model uses clusters 0 and 1 and second model uses clusters 2 and 3.
499
+
In `Compile`, pass the previous MDLA object to link them together so that they get loaded into memory in one go.
500
+
In this case, you must use `PutInput` and `GetResult` paradigm (this [section](#6-tutorial---putinput-and-getresult)), you cannot use `Run`.
Now we need to run this model using the accelerator with the SDK.
596
649
```python
597
-
sf.Compile('net_conv.onnx', 'net_conv.bin')
598
-
sf.Init("./net_conv.bin")
650
+
sf.Compile('net_conv.onnx')
599
651
in_1 = np.ascontiguousarray(inV)
600
652
result = sf.Run(in_1)
601
653
```
@@ -630,9 +682,7 @@ A debug option won't affect the compiler, it will only print more information. T
630
682
631
683
You can use `SetFlag('debug', 'b')` to print the basic prints. The debug code `'b'` stands for basic. Debug codes and option codes are letters (case-sensetive). For a complete list of letters refer to [here](docs/Codes.md).
632
684
633
-
Always put the `SetFlag()` after creating the Micron DLA object. If will print the information about the run. First, it will list all the layers that it is going to compile from the `net_conv.onnx` and produce a `net_conv.bin`.
634
-
635
-
Then `Init` will find an FPGA system, AC511 in our case. It will also show how much time it took to send the weights and instructions to the external memory in the `Init` function.
685
+
Always put the `SetFlag()` after creating the Micron DLA object. If will print the information about the run. First, it will list all the layers that it is going to compile from the `net_conv.onnx`.
636
686
637
687
Then `Run` will rearrange in the input tensor and load it into the external memory. It will print the time it took and other properties of the run, such as number of FPGAs and clusters used.
638
688
@@ -706,7 +756,7 @@ In this case, you can set 'V' in the options using `SetFlag` function before `Co
`void *cmemo`: pointer to an Inference Engine object, may be 0
45
48
46
49
`const char *modelpath`: path to a model file in ONNX format
47
50
48
-
`const char* outbin`: path to a file where a model in the Inference Engine ready format will be saved
51
+
`const char* outbin`: path to a file where a model in the Inference Engine ready format will be saved. If this param is used then Init call is needed afterwards
49
52
50
53
`const char *inshapes`: shape of the inputs in the form size0xsize1xsize2...; more inputs are separated by semi-colon; this parameter is optional as the shapes of the inputs can be obtained from the model file
51
54
@@ -55,6 +58,8 @@ Parse an ONNX model and generate Inference Engine instructions
55
58
56
59
`uint64_t ***outshapes`: returns a pointer to noutputs pointers to the shapes of each output
57
60
61
+
`void *cmemp`: MDLA object to link together so that models can be load into memory together
62
+
58
63
***Return value:*** pointer to the Inference Engine object or 0 in case of error
59
64
60
65
******
@@ -85,22 +90,9 @@ choosing the proper quantization for variable-fixed point, available with the VF
85
90
86
91
`unsigned ninputs`: number of inputs, must be a multiple of the inputs expected by the network
87
92
88
-
***Return value:*** pointer to the Inference Engine object or 0 in case of error
89
-
90
-
******
91
-
## void *ie_loadmulti
92
-
93
-
Loads multiple bitfiles without initializing hardware
93
+
`void *cmemp`: MDLA object to link together so that models can be load into memory together
94
94
95
-
***Parameters:***
96
-
97
-
`void *cmemo`: pointer to an Inference Engine object
98
-
99
-
`const char* const *inbins`: array of pathnames to the bitfiles to load
100
-
101
-
`unsigned count`: number of bitfiles to load
102
-
103
-
***Return value:*** pointer to an Inference Engine object to pass to ie_init
95
+
***Return value:*** pointer to the Inference Engine object or 0 in case of error
0 commit comments