Skip to content

Commit d6e2cf1

Browse files
committed
[otbn,doc] Extend the programmer's guide with bn.pack and bn.unpk explanations
Adds a simple example how to use the bn.pack and bn.unpack instructions. Signed-off-by: Pascal Etterli <[email protected]>
1 parent 56335b4 commit d6e2cf1

File tree

3 files changed

+63
-0
lines changed

3 files changed

+63
-0
lines changed

hw/ip/otbn/doc/pack_instruction_shifting.svg

Lines changed: 4 additions & 0 deletions
Loading

hw/ip/otbn/doc/packed_format.svg

Lines changed: 4 additions & 0 deletions
Loading

hw/ip/otbn/doc/programmers_guide.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,61 @@ The outlined technique can be extended to arbitrary bit widths but requires unro
353353

354354
Code snippets giving examples of 256x256 and 384x384 multiplies can be found in `sw/otbn/code-snippets/mul256.s` and `sw/otbn/code-snippets/mul384.s`.
355355

356+
### Packing and unpacking 24-bit element vectors
357+
The vectorized subset of Bignum instructions enable SIMD computation on 32-bit elements.
358+
However, some PQC algorithms operate on smaller values.
359+
To optimize the memory footprint of such programs, vectors can be compressed and then be stored in memory in a compressed 24-bit format.
360+
The `bn.pack` and `bn.unpk` instructions convert 32-bit vectors into a dense 24-bit representation and vice-versa as described in the [ISA manual](./isa.md).
361+
These packed vectors can then be stored in the memory as shown below.
362+
363+
<img src="./packed_format.svg" width="780"/>
364+
365+
To pack vectors one can use the following snippet:
366+
```
367+
/*
368+
* Assume we have 4 vectors with 8 32-bit elements currently in WDRs w0-w3
369+
* which we want to store in the packed format.
370+
* The color in the image corresponds to the WDRs as follows:
371+
* w0: Red vector
372+
* w1: Yellow vector
373+
* w2: Green vector
374+
* w3: Blue vector
375+
*/
376+
377+
/* Pack the vectors into temporary WDRs */
378+
bn.pack w10, w1, w0, 64
379+
bn.pack w11, w2, w1, 128
380+
bn.pack w12, w3, w2, 192
381+
382+
/* Store packed vectors to memory */
383+
...
384+
```
385+
The inner workings of the `bn.pack` instruction are visualized in the following figure for the case of `bn.pack w11, w2, w1, <shift>`.
386+
The two vectors are first converted in a dense format (192 bits each), then concatenated with additional zero bits.
387+
Finally, the 512 bits are shifted to produce the marked 256 bits which are stored to the destination WDR.
388+
This allows one to construct all the required packings.
389+
390+
<img src="./pack_instruction_shifting.svg" width="780"/>
391+
392+
To unpack vectors one can use the following approach.
393+
The unpacking works by concatenating two 256-bit strings loaded from memory and shifting the desired bits to the lower 192 bits.
394+
These 192 bits are then expanded to 8x 32 bits by inserting zero bytes every 3 bytes.
395+
```
396+
/*
397+
* Load packed vectors from memory into WDRs w10-w12 such that:
398+
* w10 corresponds the 1st line in the first image
399+
* w11 corresponds the 2nd line in the first image
400+
* w12 corresponds the 3rd line in the first image
401+
*/
402+
...
403+
404+
/* Unpack vectors */
405+
bn.unpk w0, w11, w10, 0 /* unpack the red vector to w0 */
406+
bn.unpk w1, w11, w10, 192 /* unpack the yellow vector to w1 */
407+
bn.unpk w2, w12, w11, 128 /* unpack the green vector to w2 */
408+
bn.unpk w3, wXX, w12, 64 /* unpack the blue vector to w3, wXX represents that any WDR can be used */
409+
```
410+
356411
## Device Interface Functions (DIFs)
357412

358413
- [Device Interface Functions](../../../../sw/device/lib/dif/dif_otbn.h)

0 commit comments

Comments
 (0)