Send a message into the future. Encrypt it today, no one can read it until tomorrow β no server, no trust, just math and time.
- π Near-native in the browser β a short, register-only arithmetic core keeps WASM performance close to native
- π GPU-powered encryption β WebGPU parallel computation
- β±οΈ Time-bound decryption β sequential by design, with no known practical shortcut
π Encrypt
- Cost: total iterations required for decryption (in billions)
- GPUs: number of GPU threads. More threads = faster encryption, but larger output
Requires WebGPU support.
π Decrypt
Single-threaded, CPU-only. The wait is the point.
| Device | Encrypt (GPU) | Decrypt (CPU) | Ratio |
|---|---|---|---|
| MacBook Pro M1 (3.2 GHz) | ~65 GH/s | ~100 MH/s | ~600Γ |
| GTX 1660 Ti / i5-9600K (4.6 GHz) | ~210 GH/s | ~140 MH/s | ~1500Γ |
| RTX 5090 / i9-14900K (5.8 GHz) | ~4 TH/s | ~180 MH/s | ~20000Γ |
Example: 5s encrypt β ~1 day decrypt
Rivest, Shamir, and Wagner proposed an RSA-based time-lock decades ago. It's theoretically elegant but relies on big-integer modular exponentiation.
In the browser, big-integer modular exponentiation runs ~3Γ slower than hand-optimized native libraries like GMP, which use assembly-level techniques (ADX/MULX, unrolled Montgomery multiplication) that browsers cannot match. An impatient user would simply use a native program to decrypt faster, defeating the purpose of a browser-based tool.
The same issue extends to hardware: big-integer arithmetic benefits much more from FPGA or ASIC acceleration, creating more room for specialized implementations to pull ahead.
This project takes a different approach: the algorithm uses only 32-bit multiply and XOR β simple operations in a very short, register-only hot loop, allowing WASM performance to stay close to native. This greatly reduces the incentive to leave the browser, and likely also limits the advantage of specialized hardware. The trade-off is that encryption is no longer cheap: it requires about as much total computation as decryption, but parallelizes well on GPUs.
| RSA time-lock | This project | |
|---|---|---|
| Encryption cost |
|
|
| Browser decryption | ~3Γ slower than native | β native speed |
The GPU computes slow_hash for all seeds in parallel, then chains the results to encrypt each subsequent seed. The final key is used to encrypt the plaintext.
seed[1 .. P] = random_data()
-- parallel --
for i = 1 to P
hash[i] = slow_hash(seed[i], cost / P)
-- chain --
key = hash[1]
for i = 2 to P
encrypted_seed[i] = seed[i] XOR key
key = key XOR hash[i]
ciphertext = encrypt(plaintext, key)Share seed[1], encrypted_seed[2..P], and ciphertext.
key = slow_hash(seed[1], cost / P)
for i = 2 to P
seed[i] = encrypted_seed[i] XOR key
hash[i] = slow_hash(seed[i], cost / P)
key = key XOR hash[i]
plaintext = decrypt(ciphertext, key)Starting from seed[1], each subsequent seed must be recovered using the current key, then hashed to derive the next one. This makes decryption sequential by construction.
A simple multiplication-based delay function:
// Default values selected from well-known PRNG/hash constants.
const uint32_t C1 = 0x85EBCA6B; // MurmurHash3
const uint32_t C2 = 0xC2B2AE35; // MurmurHash3
const uint32_t C3 = 0x9E3779B9; // xxHash
const uint32_t C4 = 0x27D4EB2F; // xxHash
const uint32_t C5 = 0xED5AD4BB; // lowbias32
const uint32_t C6 = 0xAC4C1B51; // lowbias32
const uint32_t C7 = 0xBF58476D; // SplitMix
const uint32_t C8 = 0x94D049BB; // SplitMix
uint64_t slow_hash(uint64_t seed, int n) {
uint32_t a = seed;
uint32_t b = seed >> 32;
for (int i = 0; i < n; i++) {
a *= C1; b ^= a;
b *= C2; a ^= b;
a *= C3; b ^= a;
b *= C4; a ^= b;
a *= C5; b ^= a;
b *= C6; a ^= b;
a *= C7; b ^= a;
b *= C8; a ^= b;
}
return (uint64_t)b << 32 | a;
}Strictly speaking, this is a permutation, not a hash β it has no compression and is invertible. An earlier version used WebCrypto PBKDF2-SHA256 as the slow hash. While conservative and well-studied, it has practical drawbacks for time-lock use:
- Browser implementations vary significantly in speed (Chrome is ~2Γ faster than Safari)
- Native libraries like fastpbkdf2 outperform even the fastest browser by ~10%
- CPUs with hardware acceleration (e.g. SHA-NI, ARMv8 Crypto Extensions) gain an unfair advantage over those without
The current approach reduces these gaps: it uses only very simple operations, leaving relatively little room for platform-specific optimization.
Additionally, its simplicity makes it highly efficient on GPU, yielding a much higher GPU-encrypt / CPU-decrypt speed ratio than PBKDF2.
Multiplication tends to mix bits more aggressively than addition or rotation, and for a delay function, a heavier primitive is a feature rather than a drawback.
But strong mixing alone is not enough β a delay function must also resist shortcut evaluation, with no known way to compute
- β Pure multiplication:
$f^n(x) = x \cdot C^n \bmod 2^{32}$ , where$C^n$ is computable in$O(\log n)$ via fast exponentiation β unsuitable for a delay function. - β Affine recurrences (such as LCGs) are similarly reducible via a closed-form expression over
$\mathbb{Z}/2^{32}\mathbb{Z}$ . - π§© Alternating MUL and XOR mixes operations from different algebraic structures: multiplication propagates carries across bit positions (ring
$\mathbb{Z}/2^{32}\mathbb{Z}$ ), while XOR acts independently on each bit over$\mathrm{GF}(2)^{32}$ . This disrupts the simple algebraic structure that makes purely multiplicative or affine recurrences easy to shortcut.
This is design intuition, not a proof of sequential hardness.
Since slow_hash outputs only 64 bits, the final key has limited entropy. The actual implementation derives the AES key as sha256(hash[1..P]), producing a 256-bit key suitable for AES.
The current slow_hash uses two 32-bit state variables with 32-bit multiplications internally. A variant using a single 64-bit state with 64-bit multiplications would offer stronger ASIC resistance:
uint64_t slow_hash_64(uint64_t seed, int n) {
while (n--) {
seed *= 0xD1342543DE82EF95ULL;
seed ^= seed >> 32;
// ...
}
return seed;
}On many modern CPUs, 32-bit and 64-bit integer multiplications have similar latency and often use the same underlying hardware path. However, a 32-bit multiplier has a shorter critical path in silicon, meaning a custom ASIC could clock 32-bit multiply faster β a potential advantage for attackers.
The trade-off is on the encryption side: GPU ALUs are natively 32-bit. Emulating 64-bit multiplication in a shader significantly reduces encryption throughput. Since this project relies on GPU parallelism to make encryption fast, 32-bit is the pragmatic choice β it maximizes the encrypt/decrypt time ratio that makes the time-lock effective.
You can override the default constants by setting CONSTANTS in the browser console before encrypting. They are included in the output, so decryption works automatically.
Constants are inlined into both WebGPU shader and WASM bytecode at runtime β zero overhead compared to hardcoded values.
Beyond constants, future versions may support randomized programs β varying instruction sequences (e.g. bitwise rotations, shifts). Similar to RandomX's approach, this makes the algorithm itself non-fixed, increasing the difficulty of analysis and specialized hardware optimization.
This is an experimental time-lock construction, not a formally proven delay function. Unlike RSA-based time-locks, which are tied to well-studied number-theoretic assumptions, the sequential hardness here is heuristic.
The design is intended to avoid obvious shortcut structures, but no theorem rules out deeper attacks. No practical shortcut is currently known, but none has been proven impossible either.
Custom hardware may achieve constant-factor speedups, and real-world performance varies by CPU microarchitecture and implementation. Unlock times should therefore be treated as approximate rather than exact.
- β° Time capsule β lock a message until a specific date
- π CPU race β first to decrypt wins a prize (compete on hardware, not luck)
- π Delayed key release β publish encrypted content now, key unlocks later
- π Hardware benchmark β measure your CPU and GPU performance

