You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: hw/ip/otbn/doc/programmers_guide.md
+55Lines changed: 55 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -353,6 +353,61 @@ The outlined technique can be extended to arbitrary bit widths but requires unro
353
353
354
354
Code snippets giving examples of 256x256 and 384x384 multiplies can be found in `sw/otbn/code-snippets/mul256.s` and `sw/otbn/code-snippets/mul384.s`.
355
355
356
+
### Packing and unpacking 24-bit element vectors
357
+
The vectorized subset of Bignum instructions enable SIMD computation on 32-bit elements.
358
+
However, some PQC algorithms operate on smaller values.
359
+
To optimize the memory footprint of such programs, vectors can be compressed and then be stored in memory in a compressed 24-bit format.
360
+
The `bn.pack` and `bn.unpk` instructions convert 32-bit vectors into a dense 24-bit representation and vice-versa as described in the [ISA manual](./isa.md).
361
+
These packed vectors can then be stored in the memory as shown below.
362
+
363
+
<imgsrc="./packed_format.svg"width="780"/>
364
+
365
+
To pack vectors one can use the following snippet:
366
+
```
367
+
/*
368
+
* Assume we have 4 vectors with 8 32-bit elements currently in WDRs w0-w3
369
+
* which we want to store in the packed format.
370
+
* The color in the image corresponds to the WDRs as follows:
371
+
* w0: Red vector
372
+
* w1: Yellow vector
373
+
* w2: Green vector
374
+
* w3: Blue vector
375
+
*/
376
+
377
+
/* Pack the vectors into temporary WDRs */
378
+
bn.pack w10, w1, w0, 64
379
+
bn.pack w11, w2, w1, 128
380
+
bn.pack w12, w3, w2, 192
381
+
382
+
/* Store packed vectors to memory */
383
+
...
384
+
```
385
+
The inner workings of the `bn.pack` instruction are visualized in the following figure for the case of `bn.pack w11, w2, w1, <shift>`.
386
+
The two vectors are first converted in a dense format (192 bits each), then concatenated with additional zero bits.
387
+
Finally, the 512 bits are shifted to produce the marked 256 bits which are stored to the destination WDR.
388
+
This allows one to construct all the required packings.
0 commit comments