SEMP is a secure communication network overlay for large-scale distributed systems. It uses the distributed Erlang protocol and a custom, DNS-based, non-EPMD distribution overlay. Each RPC uses connect→(token path)→call→close. SEMP is comprised two parts: TRUST for stable, long-lived nodes and TEMPUS for ephemeral nodes.
TRUST relies on a node-specific whitelist, connect-reconnect mechanism, and permissions dataset to control communication and defeat intrusions. The TRUST uses mTLS for secure connections and employs a token-based handshake optimization for faster reconnection. TRUST uses a multi-tiered suspicion system to identify and remove untrustworthy nodes, yet allows self-healing and graceful degradation.
TEMPUS, a Cyclon-based network overlay, enables ephemeral node discovery and security, utilizing certificates,and mTLS for efficient and secure communication. TEMPUS ensures your dynamic network environment is robust and secure.
Each SEMP RPC call uses a connect→(token path)→call→close behavior to avoid complete graph speed problems. SEMP uses
- TLS 1.3 mTLS with client certs,
- Whitelist via SHA-512 of TBSCertificates,
- Token-accelerated reconnect (short TTL),
- Per-node permissions (modules/functions) + suspicion levels, while
- No errors are sent to clients that malicious actors can leverage.
Include SEMP as a dependency in your BEAM application and it will be compiled for you. If you want to help develop SEMP, use rebar3 in your pre-system testing builds.
rebar3 compile
rebar3 shellTRUST (Trusted, Rapid, Unmodifiable, Secure Topology) is a strict, request-per-connection overlay for BEAM nodes. It replaces the Erlang distribution handshake entirely, does not use EPMD, and requires clients to connect to an explicit host:port (DNS A/AAAA only; no SRV). Every connection executes at most one request (CALL or CAST) and then closes.
TRUST enforces node identity with TLS 1.3 mutual authentication, authorizes peers via an on-disk whitelist (cert-derived fingerprint → permissions), and accelerates reconnects with short-lived, server-issued tokens bound to that fingerprint. All failures are logged and scored locally (suspicion/quarantine); errors are never sent back to remote peers.
-
✅ mTLS identity (TLS 1.3 only, ALPN
trust/1); client & server certs required -
✅ Whitelist gate keyed by SHA-512(cert DER); permissioned MFA execution
-
✅ One request per TCP/TLS connection; orderly close after each request
-
✅ Token fast-path: server issues random, short-lived token per fingerprint (FP); valid token short-circuits full post-TLS handshake on reconnect
-
✅ Suspicion/quarantine: adaptive scoring; crossing the limit revokes token and refuses further requests until health improves
-
❌ No EPMD, no builtin dist handshake, no node auto-connect
-
❌ No long-lived channels; no multiplexing; no 0-RTT RPC
-
❌ No error details returned to the caller (failures are log-only)
- TLS 1.3 only; ALPN advertised/negotiated as
trust/1 - Server requires client certificate:
{verify, verify_peer},{fail_if_no_peer_cert, true} - TRUST derives FP = SHA-512(client_cert_DER) and treats this fingerprint as the peer identity
- Source file (example):
priv/trust_whitelist.config— a map
#{ "client-cert.pem" => Spec, ... } - On load, TRUST reads PEM from
priv/certs/, extracts DER, computes FP, inserts{FP, Spec}into ETS (profiletrust) Specforms:none— nothing permitted (default if omitted/invalid)any— permit any MFA (still subject to the forbidden guard)#{ Module => all | [{Fun, Arity}], ... }— per-module control
If a PEM listed in the whitelist is missing, the entry is ignored and a warning is logged.
Before the permissions check, TRUST rejects MFAs that could affect code loading, OS/process control, sockets, tracing, atom creation, OTP behaviors, etc.
Examples (non-exhaustive):
erlang:list_to_atom/1,binary_to_term/1,apply/3,open_port/2, process registry ops, timers, tracing,code:*,rpc/erpc, OTP behaviours entry points, socket stacks (gen_tcp/udp,ssl,inets/http*), any module prefixedtrust_/semp_(and other internal namespaces)
After forbidden-check:
Spec == any→ allow (minus forbidden set)Spec == #{M => all | [{F,A}]}→ allow listed MFAs for this FP- otherwise → deny
- What: Random, server-generated bytes
Token(e.g., 48–64 B) stored per FP in ETS with expiry (TTL). One token per FP (new overwrites old). - When: After mTLS + whitelist + suspicion pass and client has no valid token, server sends
TOKEN_ISSUEwith freshToken. - Use: Client presents
TOKEN_PRESENTon the next connection; server validates (constant-time compare + expiry). If valid → proceed directly to request. - Revocation: If suspicion crosses limit, server deletes
{FP, Token, Exp}to force the full path next time. - Persistence: Tokens are in-memory only; a server restart invalidates all tokens (desired).
- Transport: TLS 1.3 stream; no 0-RTT; no compression at TLS/ETF layer
- Application frames: length-prefixed ETF (
<<Len:32/big, Payload/binary>>) - Messages (ETF terms are maps):
#{t := token_issue, token := binary()}#{t := token_present, token := binary()}#{t := call, ver := 1, req_id := <<12 bytes>>, m := atom(), f := atom(), a := non_neg_integer(), args := [term()], opts := map()}#{t := cast, ver := 1, req_id := <<12 bytes>>, m := atom(), f := atom(), a := non_neg_integer(), args := [term()], opts := map()}#{t := result, value := term()}(CALL only)
Strictness: The server requires the
req_idshape; missing/ill-typed keys are a protocol error.
- Each FP has a suspicion counter (or
quarantined) in ETS - Increment on: protocol violations, forbidden/denied MFAs, user code exceptions, timeouts
- Decrement on: successful permitted requests
- Quarantine: if counter exceeds
suspicion_limit, set toquarantined, delete token, refuse further requests until health improves - All changes are logged; you can also emit events for metrics/alerts
- DNS resolve: 2s per NS; total budget ~5s
- TCP connect: 3s per endpoint; multi-IP with ~250ms stagger
- TLS handshake: 5s
- First frame after TLS (token or request): 2s
- Request execution timeout (
call_timeout_ms): 5000ms default - Max frame: 8 MiB; Max args ETF size: 4 MiB (
payload_too_large)
- Resolve host → A/AAAA (caller supplies port)
- Connect (iterate IPs)
- TLS 1.3 handshake (ALPN
trust/1) - Token path:
- If cached valid token for FP → send
TOKEN_PRESENT - Else expect
TOKEN_ISSUE→ cache token
- If cached valid token for FP → send
- (Optional) local pre-filter MFA
- Send request:
- CALL: send, then wait for
#{t := result, value := Term} - CAST: send and do not wait
- CALL: send, then wait for
- Close TLS
- Accept TCP; do TLS 1.3 (ALPN
trust/1) - Extract client cert DER → FP = SHA-512(DER)
- Whitelist: if FP absent → close; log
whitelist_reject - Suspicion: if quarantined/over limit → delete token; close
- Token short-path:
- If client sends
TOKEN_PRESENTand token valid → proceed to request - Else issue
TOKEN_ISSUE, then wait for request
- If client sends
- Receive request (requires
req_id)- Forbidden? → log; suspicion++; close
- Permissions deny? → log; suspicion++; close
- Execute
apply(M,F,Args)(guarded):- Success: CALL → send
#{t := result, value := Ret}; CAST → send nothing; suspicion-- (floor at 0); close - Exception: log; suspicion++; maybe quarantine; close silently
- Success: CALL → send
sequenceDiagram
autonumber
participant C as Client
participant DNS as DNS
participant S as Server
C->>DNS: A/AAAA lookup (host)
DNS-->>C: [ip1, ip2, ...] (+ caller-provided port)
loop try each endpoint
C->>S: TCP connect
Note over C,S: TLS 1.3 only, ALPN "trust/1"
C->>S: ClientHello
S-->>C: ServerHello + Certificate + CertificateRequest
C-->>S: Certificate (client)
S->>S: Verify chain → FP=SHA512(cert DER)
alt FP whitelisted and not quarantined
S-->>C: Finished (TLS up)
C-->>S: Finished (TLS up)
S->>C: TOKEN_ISSUE {token}
C->>S: CALL/CAST {req_id, m, f, a, args, ...}
S->>S: Forbidden & permissions checks
alt Allowed & ok
S-->>C: RESULT (CALL only)
S-->>C: Close
else Denied or error
Note over S: log + suspicion++
S-->>C: Close (no error body)
end
else not whitelisted/quarantined
S-->>C: Close
end
end
sequenceDiagram
autonumber
participant C as Client
participant S as Server
C->>S: TCP connect
Note over C,S: TLS 1.3 (no 0-RTT RPC)
C->>S: ClientHello
S-->>C: ServerHello
C-->>S: Finished (TLS up)
C->>S: TOKEN_PRESENT {token}
S->>S: Validate(token, FP) & not expired & not quarantined?
alt valid
C->>S: CALL/CAST {req_id, m,f,a, args, ...}
S-->>C: RESULT (CALL)
S-->>C: Close
else invalid/expired
S-->>C: Close
end
stateDiagram-v2
[*] --> accepting
accepting --> tls_handshake: ssl:handshake
tls_handshake --> id_ok: client cert ok → FP
tls_handshake --> drop: tls alert / verify error
id_ok --> whitelist_check
whitelist_check --> drop: FP not allowed
whitelist_check --> token_phase
token_phase --> await_req: issued new token
token_phase --> await_req: valid token present
await_req --> eval_req: call/cast frame ok
await_req --> drop: timeout / invalid frame
eval_req --> send_result: allowed - user code ok (CALL)
eval_req --> close: allowed - user code ok (CAST)
eval_req --> close: forbidden / denied / user error (log + suspicion++)
send_result --> close
close --> [*]
[
%% Never auto-connect distributed Erlang
{kernel, [
{dist_auto_connect, never},
{logger_level, info}
]},
%% TRUST overlay (lives under the "trust" app env)
{trust, [
{port, 6464},
{tls_opts, [
%% Server identity (cert+key) and CA that issued *client* certs
{certfile, "priv/certs/localhost+2.pem"},
{keyfile, "priv/certs/localhost+2-key.pem"},
{cacertfile, "priv/certs/ca.pem"}
]},
%% Whitelist file is *always* in the host app's priv/, not the library's
{whitelist_file, "priv/trust_whitelist.config"},
{whitelist_default_spec, none},
%% Execution and policy
{call_timeout_ms, 5000},
{suspicion_limit, 3},
%% Token policy (server-issued random bytes per FP)
{token_ttl_ms, 300000} %% 5 minutes, example
]}
].
-
Enforces strict mTLS. TLS 1.3 only;
{verify, verify_peer}and{fail_if_no_peer_cert, true}; ALPN must be<<"trust/1">>. Does not use anonymous ciphers, insecure options, or 0-RTT for RPC. Always sets a trusted client CA via{cacertfile, ...}or{cacerts, ...}. -
Unique certificate per node. Each non-standalone node must have its own client cert. Sharing a cert collapses identity and breaks suspicion/quarantine accuracy. Track whitelist entries by cert DER → SHA-512 fingerprint generated at load time from PEMs in
priv/certs/. -
Tokens are not identity. Tokens are server-issued random bytes, per-fingerprint, short-lived, stored in-memory, compared in constant time, and revoked on quarantine. They only fast-path reconnects and are intentionally invalidated on server restart.
-
Strict framing and sizes. All app frames are length-prefixed ETF. Cap maximum frame/payload (e.g., 8 MiB frame, 4 MiB args). Reject oversize inputs with a protocol error (log-only).
-
Defense-in-depth forbidden set. Proactively rejects dangerous MFAs (code loading, tracing, OS/ports, sockets/HTTP stacks, atom creation, OTP behaviour entry points, process introspection, timers, your internal namespaces like
trust_*/semp_*, etc.) before permission evaluation. -
Does not leak errors to clients. The wire protocol only returns
#{t := result, value := Term}for successful CALLs. CAST never returns anything. All failures are logged locally and contribute to suspicion. -
Whitelist is authoritative.
anyis allowed but weaker—prefer explicit per-module function allowlists. Missing PEMs in the whitelist are ignored (with warnings). Keeppriv/trust_whitelist.configunder the host app’spriv/. -
Suspicion/quarantine tuning. Increments on protocol violations, forbidden/denied MFAs, timeouts, user code errors. Decrements on successful permitted requests. Exceeding the limit sets
quarantinedand deletes the token and quarenteens the requester. Quarenteened peers require an out-of-system reset. -
Operational hardening. Use per-connection timeouts, accept backoff on errors, and consider connection rate limiting. Ensure ETS tables are created with read/write concurrency flags and appropriate privacy (whitelist read-only to normal processes).
-
Name resolution. Only A/AAAA; no SRV. Be mindful of local resolver configuration and caching; optionally pin or bound TTLs in callers.
-
Whitelist miss (FP not present)
Server action: Close TLS; logwhitelist_reject. No response body.
Client result:{error, connect_failed}or{error, closed}after handshake. -
Quarantined / suspicion exceeded
Server action: Delete token for FP; refuse request; close. Logged with reason.
Client result:{error, closed}; subsequent reconnects require fresh full path. -
Token invalid/expired/mismatched
Server action: Close; do not fall back in the same connection.
Client result: Must reconnect; receivesTOKEN_ISSUEon full path. -
Protocol violation (bad ETF / wrong shape / missing keys / size limit)
Server action: Logprotocol_error; suspicion++; close.
Client result:{error, closed}or{error, protocol_error}(if detectable client-side). -
Forbidden MFA
Server action: Logforbidden_mfa; suspicion++; close (no body).
Client result:{error, closed}. -
Permissions deny
Server action: Logpermission_denied; suspicion++; close (no body).
Client result:{error, closed}. -
User code exception during apply/3
Server action: Loguser_code_error; suspicion++; close (no body).
Client result:{error, closed}. -
Timeouts
Handshake/first-frame/request timeout
Server action: Log with stage (tls_timeout/first_frame_timeout/call_timeout); suspicion++ (soft); close.
Client result:{error, timeout}or{error, closed}. -
ALPN mismatch
Server action: Close; logalpn_mismatch.
Client result:{error, closed}from TLS. -
TLS verify failure / missing client cert / expired cert
Server action: TLS alert; connection terminated; logtls_verify_failed.
Client result:{error, tls_alert}or{error, closed}depending on side. -
DNS/connectivity (NXDOMAIN, refused, connect timeout)
Client action: Iterates endpoints; on exhaustion returns{error, connect_failed}. Server sees nothing. -
Server restart
Effect: All in-memory tokens become invalid; next client attempt fails token validation and is forced through the full path (with newTOKEN_ISSUE).
TEMPUS is a secure peer–sampling and membership layer that uses the Cyclon gossip protocol to keep three independent peer sets fresh and unbiased—within a single process. Each peer set (“type”) has its own Cyclon view, capacity, and shuffle cadence. There is no cross-type shuffle.
The goal: a highly and robustly secure system where peers are cryptographically verifiable, churn is handled gracefully, and sampling remains uniform within each population.
We track three populations side-by-side:
-
Type 1 —
tempus_edge(ephemeral)
High-churn, short-lived peers (formerly calledtempus). These form the bulk of the mesh and do standard Cyclon shuffles among themselves. -
Type 2 —
tempus_bridge(non-ephemeral)
Longer-lived peers that bridge to separate TRUST overlays/instances. They shuffle only with othertempus_bridgepeers. Application logic uses them to reach TRUST—Cyclon state stays isolated. -
Type 3 —
tempus_sentinel(non-ephemeral; membership guardians)
Longer-lived peers that ephemeral nodes use to prove producer-authorized membership. They run Cyclon among themselves for availability, and act as admission guardians (see “Identity & Admission”).
You may run different counts of each type. The only invariant is no cross-type mixing.
- One TEMPUS server maintains three independent Cyclon stores:
tempus_edge,tempus_bridge,tempus_sentinel. - Each store has: fixed capacity, shuffle size l, independent tick (periodic timer), and maintains
(peer, age)entries. - Within a store on each tick:
- Increment ages.
- Pick the oldest neighbor as the shuffle partner.
- Prepare an outgoing buffer (include
{self, age=0}), remove sent entries from the local view. - On merge, prefer younger entries; evict oldest to respect capacity.
- Across stores: there is no exchange of entries; each population remains unbiased relative to itself.
TEMPUS separates sampling (Cyclon) from trust (admission). A peer only enters a store after proving it’s an authorized instance of your product—without any shared secret among peers.
Core elements:
-
Per-install asymmetric identity
Each node generates an Ed25519 keypair on first run and derives apeer_idfrom the public key. No symmetric keys are shared between peers. -
Producer-signed, short-lived tokens
Your backend (“producer”) issues very short-lived tokens (e.g., 2–5 minutes) that bind to the peer’s public key (sub) and include role/type (typ ∈ {tempus_edge, tempus_bridge, tempus_sentinel}). -
No-overlap key rotation (strict)
Verifiers pin to the current signing key viakid(key id). At rotation, tokens signed with an old key are rejected with a precise “rotated key” error; legitimate peers immediately re-fetch a token under the new key. Tight TTLs + proactive refresh (TTL/2) keep recovery fast. -
Proof-of-Possession (POP)
When presenting a token, the peer signs a fresh challenge with its own private key (the same public key carried insub). Replay of copied tokens fails without the private key. -
Platform attestation for app-store builds
On iOS use App Attest (and on Android Play Integrity) during token issuance so your backend only signs tokens for genuine, unmodified builds on real devices.
Outcome: Only peers with valid, current producer signatures and POP are admitted to the appropriate Cyclon store. Bad actors without both are excluded.
sequenceDiagram
participant P as Peer (edge / bridge / sentinel)
participant A as Platform Attestation (App Attest / Play Integrity)
participant I as Producer Issuer (Token Service)
participant V as Verifier Peer
participant C as TEMPUS (Cyclon Store)
Note over P: Generate Ed25519 keypair → derive peer_id
P->>A: Request attestation for this app/device
A-->>I: Forward attestation evidence
P->>I: {peer_pub, typ} + attestation proof
Note right of I: Validate attestation and check policy
I-->>P: Short-lived token (kid=current, sub=peer_pub, typ, nbf/exp)
Note over P: Build advert {peer_id, peer_pub, token, typ}
P->>V: Send advert
V-->>V: Verify token (kid=current, signature, nbf/exp, sub==peer_pub)
V->>P: POP challenge (fresh nonce)
P-->>V: POP response = Sign (nonce || token) with peer_priv
V-->>V: Verify POP using peer_pub (from token.sub)
V->>C: Admit peer into store by typ (no cross-type mixing)
| Dimension | gRPC | SEMP |
|---|---|---|
| Transport / wire | HTTP/2 (“h2”), HPACK, frames, streams | TLS 1.3 over TCP, custom ALPN "trust/1", ETF frames. |
| Interoperability | Excellent: polyglot (all major languages) | In developement. Swift first, others follow. |
| AuthN | TLS/mTLS via standard HTTP/2 stack | mTLS + SHA-512 cert fingerprint whitelist + per-node token |
| AuthZ | Per-service/method (app logic) | Built-in per-MFA allowlists + suspicion/quarantine |
| Connection model | Long-lived, multiplexed streams | Per-request connect → RPC → close (token speeds reconnect) |
| Streaming | Native client/server/bidi streaming | Not supported. (one request per TLS session) Early roadmap |
| Backpressure / flow control | HTTP/2 stream & connection flow control | None yet. Implementing sounding node control first for security reasons. |
| Payload format | Protobuf (schema/IDL, compact cross-lang) | Erlang terms (ETF) – zero schema, you call {M,F,A} directly (guarded by white and blacklist) |
| Latency (warm path) | Very good (no handshake per call) | Handshake per request; tokens used to speed reconnect |
| Performance at scale | High throughput via multiplexing (multiple RPC calls per connection) | Many short TLS handshakes; more CPU/RTT overhead, complete graph problem avoided. Time limitted multiplexing in early roadmap to reduce overhead. |
| Service discovery | DNS, SRV, LB/Envoy/xDS, gRPC naming | DNS name + port; simple |
| Middlebox/LB friendliness | Excellent (Envoy/Nginx support) | Custom ALPN/port(TRUST); TEMPUS load balanced without external tools. Middleboxs allowed. |
| Observability/tooling | Mature (interceptors, tracing, metrics) | Logging built into the source code for all SEMP state changes, app dev indicates logging level and log information destination |
| Security posture | Solid baseline; policy layered in app | Tight, opinionated policy (whitelist, blacklist, suspicion, MFA bans) |
| Access control granularity | Per RPC method (service API) | Very granular per module/function/arity |
| Error semantics | Standard gRPC status codes | Purposefully non-verbose (RESULT/ERROR frames; “no result on deny/cast”). Logged locally. |
| Versioning / evolution | Protobuf compatibility rules | ETF terms. No atoms. |
| Binary size / deps | Heavier (HTTP/2, Protobuf stack) | Light (ssl + small libs) |
| Polyglot clients | Many | In developement (Swift first) |
| Suitability for internal BEAM | Fine, but overkill if only BEAM ↔ BEAM | Excellent (matches BEAM terms, deep security model) |
| Governance / control | Constrained by gRPC conventions | Full control (handshake, tokens, policies) |
| Failure isolation | Stream-level reset without reconnect | Each call isolated by connection; small blast-radius limits damage, connection close on failure/error |
| Sounding Clients | no support | Early roadmap implementation |
| Native Code Look and Feel | Define services/messages in .proto, generate stubs; call via generated modules. | Call/cast plain MFAs: trpc:call(Host, Port, {M,F,A}, Args, Opts); no IDL needed. |