Skip to content

Commit 4c51469

Browse files
committed
Add RealTek AmebaPro2 (RTL8735B) HUK crypto-callback port
Binds wolfCrypt AES (GCM/ECB/CBC/CTR) and HUK-bound ECDSA sign to the RTL8735B silicon Hardware Unique Key via the crypto-callback (CryptoCb) framework. A 256-bit seed runs through the HAL secure HKDF key-ladder against the HUK to land a device-bound working key in a secure key-storage slot; the working key never enters software. Pure crypto-callback device: no new wolfSSL core API or struct fields, and no changes to shared core files. AES reads its seed from the standard aes->devKey; ECDSA reads a port-defined wc_AmebaPro2_EccKey (the HUK-wrapped scalar + seed) the caller attaches via the standard ecc_key->devCtx. The HAL needs 32-byte-aligned DMA buffers, so unaligned iv/aad/in/out are bounced through aligned temporaries. Enabled with WOLFSSL_REALTEK_HUK + WOLF_CRYPTO_CB; --enable-amebapro2 builds a host compile-test against a HAL shim. Validated on RTL8735B silicon (FreeRTOS SDK and Zephyr): full wolfcrypt_test PASS, the HUK AES modes (incl. unaligned-buffer GCM), and HUK-bound ECDSA (P-256 signature verifies against the original public key). See wolfcrypt/src/port/realtek/README.md.
1 parent cc6887f commit 4c51469

8 files changed

Lines changed: 1186 additions & 1 deletion

File tree

configure.ac

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3212,6 +3212,25 @@ case "$ENABLED_STSAFE" in
32123212
esac
32133213

32143214

3215+
# RealTek AmebaPro2 (RTL8735B) HUK crypto-callback port.
3216+
# On-target the application supplies the AmebaPro2 HAL include path. This option
3217+
# is a host compile-test of the port: it swaps the HAL headers for a shim
3218+
# (WOLFSSL_AMEBAPRO2_HOST_TEST) so the cryptocb dispatch and wiring build without
3219+
# the vendor SDK. It forces crypto callbacks on (see the cryptocb block).
3220+
# Example: "./configure --enable-amebapro2"
3221+
ENABLED_AMEBAPRO2="no"
3222+
AC_ARG_ENABLE([amebapro2],
3223+
[AS_HELP_STRING([--enable-amebapro2],
3224+
[Enable RealTek AmebaPro2 (RTL8735B) HUK crypto-callback port (host compile-test).])],
3225+
[ ENABLED_AMEBAPRO2=$enableval ],
3226+
[ ENABLED_AMEBAPRO2=no ])
3227+
3228+
if test "x$ENABLED_AMEBAPRO2" != "xno"
3229+
then
3230+
AM_CFLAGS="$AM_CFLAGS -DWOLFSSL_REALTEK_HUK -DWOLFSSL_AMEBAPRO2_HOST_TEST -DHAVE_AES_ECB"
3231+
fi
3232+
3233+
32153234
# NXP SE050
32163235
# Example: "./configure --with-se050=/home/pi/simw_top"
32173236
ENABLED_SE050="no"
@@ -10680,7 +10699,7 @@ AC_ARG_ENABLE([cryptocb-sw-test],
1068010699
[ ENABLED_CRYPTOCB_SW_TEST=yes ]
1068110700
)
1068210701
10683-
if test "x$ENABLED_PKCS11" = "xyes" || test "x$ENABLED_WOLFTPM" = "xyes" || test "$ENABLED_CAAM" != "no"
10702+
if test "x$ENABLED_PKCS11" = "xyes" || test "x$ENABLED_WOLFTPM" = "xyes" || test "$ENABLED_CAAM" != "no" || test "x$ENABLED_AMEBAPRO2" != "xno"
1068410703
then
1068510704
ENABLED_CRYPTOCB=yes
1068610705
fi
@@ -12429,6 +12448,7 @@ AM_CONDITIONAL([BUILD_IOTSAFE],[test "x$ENABLED_IOTSAFE" = "xyes"])
1242912448
AM_CONDITIONAL([BUILD_IOTSAFE_HWRNG],[test "x$ENABLED_IOTSAFE_HWRNG" = "xyes"])
1243012449
AM_CONDITIONAL([BUILD_SE050],[test "x$ENABLED_SE050" = "xyes"])
1243112450
AM_CONDITIONAL([BUILD_STSAFE],[test "x$ENABLED_STSAFE" != "xno"])
12451+
AM_CONDITIONAL([BUILD_AMEBAPRO2],[test "x$ENABLED_AMEBAPRO2" != "xno"])
1243212452
AM_CONDITIONAL([BUILD_TROPIC01],[test "x$ENABLED_TROPIC01" = "xyes"])
1243312453
AM_CONDITIONAL([BUILD_KDF],[test "x$ENABLED_KDF" = "xyes"])
1243412454
AM_CONDITIONAL([BUILD_HMAC],[test "x$ENABLED_HMAC" = "xyes"])
@@ -13008,6 +13028,7 @@ echo " * IoT-Safe: $ENABLED_IOTSAFE"
1300813028
echo " * IoT-Safe HWRNG: $ENABLED_IOTSAFE_HWRNG"
1300913029
echo " * NXP SE050: $ENABLED_SE050"
1301013030
echo " * STMicro STSAFE: $ENABLED_STSAFE"
13031+
echo " * RealTek AmebaPro2 HUK: $ENABLED_AMEBAPRO2"
1301113032
echo " * TROPIC01: $ENABLED_TROPIC01"
1301213033
echo " * Maxim Integrated MAXQ10XX: $ENABLED_MAXQ10XX"
1301313034
echo " * PSA: $ENABLED_PSA"

wolfcrypt/src/include.am

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,9 @@ EXTRA_DIST += wolfcrypt/src/port/ti/ti-aes.c \
105105
wolfcrypt/src/port/st/README.md \
106106
wolfcrypt/src/port/st/STM32MP13.md \
107107
wolfcrypt/src/port/st/STM32MP25.md \
108+
wolfcrypt/src/port/realtek/amebapro2.c \
109+
wolfcrypt/src/port/realtek/amebapro2_shim.h \
110+
wolfcrypt/src/port/realtek/README.md \
108111
wolfcrypt/src/port/tropicsquare/tropic01.c \
109112
wolfcrypt/src/port/tropicsquare/README.md \
110113
wolfcrypt/src/port/af_alg/afalg_aes.c \
@@ -244,6 +247,10 @@ if BUILD_TROPIC01
244247
src_libwolfssl@LIBSUFFIX@_la_SOURCES += wolfcrypt/src/port/tropicsquare/tropic01.c
245248
endif
246249

250+
if BUILD_AMEBAPRO2
251+
src_libwolfssl@LIBSUFFIX@_la_SOURCES += wolfcrypt/src/port/realtek/amebapro2.c
252+
endif
253+
247254
if BUILD_PSA
248255
src_libwolfssl@LIBSUFFIX@_la_SOURCES += wolfcrypt/src/port/psa/psa.c
249256
src_libwolfssl@LIBSUFFIX@_la_SOURCES += wolfcrypt/src/port/psa/psa_hash.c
Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# RealTek AmebaPro2 (RTL8735B) HUK Port
2+
3+
Binds wolfCrypt keys to the RTL8735B silicon Hardware Unique Key (HUK) through
4+
the AmebaPro2 HAL crypto engine, via the wolfCrypt crypto-callback (CryptoCb)
5+
framework. A 256-bit "seed" is run through the HAL HKDF key-ladder against the
6+
HUK to land a device-bound working key in a secure key-storage slot; AES
7+
(GCM/ECB/CBC/CTR) then runs from that slot and the working key never enters
8+
software. It is a pure crypto-callback device and adds no wolfSSL core API or
9+
struct fields: AES reads its seed from the standard `aes->devKey`, and ECDSA
10+
reads a `wc_AmebaPro2_EccKey` (the HUK-wrapped scalar + seed) the caller attaches
11+
via the standard `ecc_key->devCtx`. This mirrors the device pattern the STM32
12+
DHUK port (`wc_Stm32_DhukRegister`) also uses.
13+
14+
## Hardware
15+
16+
RTL8735B / AmebaPro2 security blocks used by this port (from the
17+
`Ameba-AIoT/nuwa_hal_realtek` SDK, `rtl8735b` branch, headers under
18+
`ameba/amebapro2/source/fwlib/rtl8735b/include/`):
19+
20+
- HUK in OTP: `SB_OTP_HIGH_VAL_HUK1` (0x21), `HUK2` (0x22), `HUK_RMA` (0x2F).
21+
- HKDF key-ladder in secure RAM: `hal_hkdf_hmac_sha256_secure_init`,
22+
`hal_hkdf_extract_secure_all`, `hal_hkdf_expand_secure_all` -- derive the HUK
23+
into a secure key-storage slot without exposing the key to software.
24+
- AES secure-key ops that reference the derived slot by number:
25+
`hal_crypto_aes_ecb_sk_init`, `hal_crypto_aes_gcm_sk_init` (key never leaves
26+
hardware).
27+
- ECDSA (`hal_ecdsa.h`) and OTP-resident ECDSA keys (`hal_otp_ecdsa_key_*`) for
28+
the HUK-bound sign path (Stage 3, in progress).
29+
- TRNG (`hal_trng.h`); the `ameba-zephyr-pro2-platform` repo provides a Zephyr
30+
entropy driver (`entropy_amebapro2.c`, DT `realtek,amebapro2-trng`) that feeds
31+
wolfCrypt's `wc_GenerateSeed` via `sys_rand_get`.
32+
33+
## Enabling
34+
35+
```c
36+
#define WOLFSSL_REALTEK_HUK /* enable the AmebaPro2 HUK device */
37+
#define WOLF_CRYPTO_CB /* required -- HUK routes through crypto callbacks */
38+
```
39+
40+
Set these in `user_settings.h`. The application/board CMake must add
41+
the AmebaPro2 HAL include directory (e.g.
42+
`.../fwlib/rtl8735b/include/`) to the wolfSSL library include path so this port
43+
can include `hal_crypto.h` and `hal_hkdf.h` (plus `hal_ecdsa.h` once the ECDSA
44+
sign path lands).
45+
46+
Configurable (override in `user_settings.h` before including wolfSSL):
47+
48+
| Macro | Default | Meaning |
49+
|--------------------------------|---------|--------------------------------------|
50+
| `WC_HUK_DEVID` | 809 | CryptoCb device id (STM32 DHUK is 808) |
51+
| `WC_AMEBAPRO2_HUK_SK_IDX` | 1 | Secure-key slot holding the HUK (HUK1) |
52+
| `WC_AMEBAPRO2_HKDF_PRK_IDX` | 3 | Intermediate HKDF PRK slot |
53+
| `WC_AMEBAPRO2_DERIVED_WB_IDX` | 4 | Derived working-key slot (AES uses it) |
54+
| `WC_AMEBAPRO2_HKDF_CRYPTO_SEL` | 0 | `crypto_sel` for the secure HKDF init |
55+
56+
## API
57+
58+
```c
59+
#include <wolfssl/wolfcrypt/port/realtek/amebapro2.h>
60+
61+
/* One-time: register the AmebaPro2 HUK crypto-callback device. */
62+
wc_AmebaPro2_HukRegister(WC_HUK_DEVID);
63+
64+
/* AES / GCM: enable via devId at init, then pass the 256-bit seed as the key.
65+
* The seed is HKDF input that diversifies the HUK -- it is NOT the AES key. */
66+
Aes aes;
67+
byte seed[32]; /* per-purpose derivation seed (need not be secret) */
68+
wc_AesInit(&aes, NULL, WC_HUK_DEVID);
69+
wc_AesGcmSetKey(&aes, seed, 32);
70+
wc_AesGcmEncrypt(&aes, ct, pt, ptSz, iv, 12, tag, tagSz, aad, aadSz); /* full GCM */
71+
wc_AesFree(&aes);
72+
73+
/* AES-ECB / AES-CBC follow the same pattern (wc_AesSetKey + wc_AesEcb*/
74+
/* wc_AesCbc* with devId = WC_HUK_DEVID). */
75+
76+
wc_AmebaPro2_HukUnRegister(WC_HUK_DEVID);
77+
```
78+
79+
The seed maps to a device-bound working key as:
80+
HUK (slot `WC_AMEBAPRO2_HUK_SK_IDX`) -> `hal_hkdf_extract_secure_all` -> PRK slot
81+
-> `hal_hkdf_expand_secure_all` -> working key in `WC_AMEBAPRO2_DERIVED_WB_IDX`
82+
-> `hal_crypto_aes_gcm_sk_init` / `hal_crypto_aes_ecb_sk_init`. The derive and
83+
the AES op run under one crypto-mutex hold; the working key never enters
84+
software. Identical seed -> identical working key (deterministic, so GMAC
85+
verifies and AES round-trips); a wrong seed yields a different key (GCM decrypt
86+
returns `AES_GCM_AUTH_E`).
87+
88+
HUK-bound ECDSA sign (Stage 3, wrapped-scalar): point the key's crypto-callback
89+
context at a `wc_AmebaPro2_EccKey` (the scalar AES-wrapped under a HUK-derived
90+
key, plus its 32-byte seed) -- no dedicated wolfSSL import API:
91+
92+
```c
93+
#include <wolfssl/wolfcrypt/port/realtek/amebapro2.h>
94+
wc_AmebaPro2_EccKey hk = { seed, 32, wrapped, wrappedLen, plainLen };
95+
ecc_key key;
96+
wc_ecc_init_ex(&key, NULL, WC_HUK_DEVID);
97+
wc_ecc_set_curve(&key, plainLen, ECC_SECP256R1);
98+
key.devCtx = &hk; /* borrowed; must outlive the key */
99+
wc_ecc_sign_hash(hash, hashSz, sig, &sigSz, rng, &key);
100+
```
101+
102+
At sign time the port derives the slot key from the seed, ECB-unwraps the scalar
103+
into a short-lived buffer, signs, and scrubs it. The wrapped blob is device-bound
104+
(it only unwraps on the silicon whose HUK produced the slot key). The scalar is
105+
briefly in software during the sign; an OTP-resident model (`hal_ecdsa_select_prk`,
106+
scalar never in software) and routing the sign itself through the HW ECDSA engine
107+
(`hal_ecdsa`) are follow-ons.
108+
109+
## Notes / limitations
110+
111+
- The HAL GCM path assumes a 96-bit (12-byte) IV (standard J0). A non-12-byte
112+
IV returns a hard error (not a software fallback, which would key off the seed
113+
rather than the device-bound key).
114+
- AES-CBC and AES-CTR chain in software over single-block
115+
`hal_crypto_aes_ecb_sk_*` calls because the HAL exposes no CBC/CTR secure-key
116+
variant; the key still stays in hardware. CTR maintains the wolfCrypt counter
117+
state (`aes->reg`/`tmp`/`left`) so partial blocks continue across calls.
118+
- The HAL crypto engine DMAs its buffers on 32-byte (cache-line) boundaries and
119+
rejects an unaligned GCM iv/aad. The port stages key/iv/aad/tag on aligned
120+
temporaries and bounces unaligned in/out through aligned buffers, so callers
121+
need not align.
122+
- Each operation derives the working key from the Aes' own `devKey` seed under
123+
the crypto mutex (no shared port global), so concurrent `Aes` objects are
124+
safe.
125+
- `--enable-amebapro2` builds a host compile-test only: it swaps the HAL headers
126+
for `amebapro2_shim.h` (sentinel stubs, no real crypto) to exercise the
127+
crypto-callback dispatch and build wiring without the vendor SDK. All
128+
functional validation requires RTL8735B hardware.
129+
130+
## Status
131+
132+
Validated on RTL8735B silicon (both the RealTek FreeRTOS SDK app and a Zephyr
133+
image): registration, AES-GCM (encrypt / deterministic tag / decrypt-verify /
134+
round-trip / wrong-seed -> `AES_GCM_AUTH_E`), AES-ECB and AES-CBC all pass.
135+
136+
- Stage 0 (skeleton, build wiring, host compile-test): done.
137+
- Stage 1 (HUK key-ladder + full AES-GCM): done, validated on hardware.
138+
- Stage 2 (AES-ECB / AES-CBC / AES-CTR): done, validated on hardware.
139+
- Stage 3 (HUK-bound ECDSA sign, wrapped-scalar): done, validated on RTL8735B
140+
(P-256 sign verifies against the original public key; tampered hash fails).
141+
OTP-resident keys and HW-ECDSA-engine signing are follow-ons.
142+
143+
## Benchmarks (software crypto baseline)
144+
145+
`wolfcrypt_test` (full self-test, all PASS) and `wolfcrypt_benchmark` were run on
146+
the RTL8735B EVB to validate the core library and toolchain on this target. The
147+
figures below are **pure software wolfCrypt** -- they are NOT the HUK device
148+
(which routes AES through the silicon engine for HUK-derived keys); they serve as
149+
a reference baseline and to size the benefit of hardware offload.
150+
151+
- Target: RTL8735B "KM4" Arm Cortex-M33 (ARMv8-M Mainline, TrustZone + DSP) at
152+
500 MHz (`CPU_CLK`); DDR at 533 MHz.
153+
- Toolchain / build: RealTek ASDK 10.3.0 (GCC 10.3.0), SDK default `-Os`,
154+
FreeRTOS, `WOLFCRYPT_ONLY`, `SINGLE_THREADED`, big-integer math via the generic
155+
`WOLFSSL_SP_MATH_ALL` (portable C, no Cortex-M assembly), `BENCH_EMBEDDED`.
156+
- Build options live with the SDK example (not in the wolfSSL tree):
157+
`component/example/wolfcrypt_test/{user_settings.h, wolfcrypt_test.cmake,
158+
main.c}` of the AmebaPro2 FreeRTOS SDK. The RNG is seeded from the SDK
159+
`rtw_get_random_bytes`; `current_time()` uses `hal_read_systime_us()`.
160+
161+
Symmetric / hash (higher is better):
162+
163+
| Algorithm | Throughput |
164+
|---------------------|------------|
165+
| AES-128-CBC enc/dec | 9.55 / 9.67 MiB/s |
166+
| AES-256-CBC enc/dec | 7.25 / 7.02 MiB/s |
167+
| AES-128-GCM enc/dec | 5.35 / 5.33 MiB/s |
168+
| AES-256-GCM enc/dec | 4.53 / 4.52 MiB/s |
169+
| AES-128-CTR | 9.75 MiB/s |
170+
| AES-128-ECB enc/dec | 10.42 / 10.56 MiB/s |
171+
| AES-CCM enc/dec | 4.73 / 4.65 MiB/s |
172+
| GMAC (4-bit table) | 13.43 MiB/s |
173+
| AES-128-CMAC | 8.84 MiB/s |
174+
| ChaCha20 | 24.79 MiB/s |
175+
| ChaCha20-Poly1305 | 15.83 MiB/s |
176+
| Poly1305 | 64.77 MiB/s |
177+
| SHA-1 | 29.19 MiB/s |
178+
| SHA-256 | 10.94 MiB/s |
179+
| SHA-512 | 7.29 MiB/s |
180+
| SHA3-256 | 6.61 MiB/s |
181+
| HMAC-SHA256 | 10.85 MiB/s |
182+
183+
Public key (higher is better):
184+
185+
| Operation | Rate |
186+
|-----------------------|------|
187+
| RSA-2048 public | 214.7 ops/s |
188+
| RSA-2048 private | 6.14 ops/s |
189+
| RSA-2048 key gen | 0.40 ops/s |
190+
| DH-2048 key gen/agree | 17.67 / 15.23 ops/s |
191+
| ECDSA P-256 sign/verify | 40.03 / 29.81 ops/s |
192+
| ECDHE P-256 agree | 40.69 ops/s |
193+
| Curve25519 key gen/agree | 414.8 / 419.4 ops/s |
194+
| Ed25519 sign/verify | 788.3 / 397.0 ops/s |
195+
196+
The tables above are the portable-C baseline. The assembly backends below raise
197+
these substantially. Curve25519/Ed25519 already use the dedicated
198+
`curve25519.c`/`ed25519.c` fast code.
199+
200+
## Optimizations (measured on RTL8735B @ 500 MHz, -Os)
201+
202+
Two wolfCrypt assembly backends apply to this Cortex-M33 and were validated on
203+
hardware (both keep `wolfcrypt_test` all-PASS). Neither needs wolfSSL source
204+
changes -- they are build-config selections plus adding the relevant asm files.
205+
206+
### 1. Public key -- `sp_cortexm.c` (Thumb-2/DSP single-precision)
207+
208+
Enable with `WOLFSSL_SP_ARM_CORTEX_M_ASM` + `WOLFSSL_HAVE_SP_RSA` +
209+
`WOLFSSL_HAVE_SP_ECC` + `WOLFSSL_HAVE_SP_DH`, and add `wolfcrypt/src/sp_cortexm.c`
210+
to the build (alongside the generic `sp_int.c` for sizes without an asm path).
211+
212+
| Operation | Generic C | sp_cortexm | Speedup |
213+
|------------------------|-----------|------------|---------|
214+
| ECC P-256 key gen | 40.7 | 541.2 ops/s | 13.3x |
215+
| ECDSA P-256 sign | 40.0 | 427.6 ops/s | 10.7x |
216+
| ECDSA P-256 verify | 29.8 | 292.7 ops/s | 9.8x |
217+
| ECDHE P-256 agree | 40.7 | 318.1 ops/s | 7.8x |
218+
| RSA-2048 public | 214.7 | 618.4 ops/s | 2.9x |
219+
| RSA-2048 private | 6.14 | 19.0 ops/s | 3.1x |
220+
| DH-2048 agree | 15.2 | 38.3 ops/s | 2.5x |
221+
222+
### 2. Symmetric -- Thumb-2 asm (`port/arm/thumb2-*-asm.S`)
223+
224+
Enable with `WOLFSSL_ARMASM` + `WOLFSSL_ARMASM_THUMB2` +
225+
`WOLFSSL_ARMASM_NO_HW_CRYPTO` + `WOLFSSL_ARMASM_NO_NEON` + `WOLFSSL_ARM_ARCH=7`,
226+
and add `thumb2-aes-asm.S`, `thumb2-sha256-asm.S`, `thumb2-sha512-asm.S`,
227+
`thumb2-sha3-asm.S`, `thumb2-chacha-asm.S`, `thumb2-poly1305-asm.S`.
228+
`WOLFSSL_ARMASM` is a global switch, so provide the `.S` for every covered
229+
module. (Curve25519/Ed25519 also have Thumb-2 asm but their `ge_operations.c`
230+
integration assumes 64-bit and was left on the C path here.)
231+
232+
| Algorithm | Generic C | Thumb-2 asm | Speedup |
233+
|---------------------|-----------|-------------|---------|
234+
| AES-128-CBC enc | 9.55 | 20.85 MiB/s | 2.2x |
235+
| AES-128-ECB enc | 10.42 | 20.82 MiB/s | 2.0x |
236+
| AES-128-CTR | 9.75 | 20.47 MiB/s | 2.1x |
237+
| AES-128-GCM enc | 5.35 | 10.30 MiB/s | 1.9x |
238+
| GMAC | 13.43 | 20.81 MiB/s | 1.5x |
239+
| AES-128-CMAC | 8.84 | 14.67 MiB/s | 1.7x |
240+
| ChaCha20 | 24.79 | 46.44 MiB/s | 1.9x |
241+
| ChaCha20-Poly1305 | 15.83 | 25.38 MiB/s | 1.6x |
242+
| SHA-256 | 10.94 | 17.83 MiB/s | 1.6x |
243+
| SHA3-256 | 6.61 | 8.64 MiB/s | 1.3x |
244+
| HMAC-SHA256 | 10.85 | 17.66 MiB/s | 1.6x |
245+
246+
### Note on hardware offload
247+
248+
For AES, hashing and ECDSA the RTL8735B has a dedicated crypto engine (the HAL
249+
`hal_crypto_*` / `hal_ecdsa` blocks this HUK port already uses for HUK-derived
250+
keys). A general (any-key) HW crypto-callback port over that engine would beat
251+
the Thumb-2 software figures above and is the recommended production path for
252+
symmetric throughput; the Thumb-2 asm is the portable software fallback. The
253+
`sp_cortexm.c` PK speedup is worth taking regardless, since it needs no silicon
254+
support.

0 commit comments

Comments
 (0)