Skip to content

Commit 9548b21

Browse files
authored
feat(tokenization): introduce secure deterministic tokenization (#92)
Resolve the rainbow table attack vulnerability in deterministic tokenization by introducing per-key version salts and HMAC-SHA256 keyed hashing. This ensures that the same plaintext value produces different hashes across different key versions and system installations. Key changes: - Added `Salt` field to `TokenizationKey` domain model. - Refactored `HashService` to use HMAC-SHA256 with 32-byte salts. - Updated `tokenizationKeyUseCase` to generate random salts for new key versions. - Added database migrations (PostgreSQL/MySQL) for the `salt` column. - Updated repository implementations to persist and retrieve the `salt` field. - Updated application version to v0.24.0 and documented changes in CHANGELOG.md.
1 parent c877a54 commit 9548b21

26 files changed

Lines changed: 234 additions & 122 deletions

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.24.0] - 2026-03-03
9+
10+
### Changed
11+
- Refactored `tokenization` module to improve security for deterministic mode by adding per-key version salts and HMAC-SHA256 keyed hashing to prevent rainbow table attacks.
12+
813
## [0.23.0] - 2026-03-02
914

1015
### Added

cmd/app/main.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ import (
1212

1313
// Build-time version information (injected via ldflags during build).
1414
var (
15-
version = "v0.23.0" // Semantic version with "v" prefix (e.g., "v0.12.0")
15+
version = "v0.24.0" // Semantic version with "v" prefix (e.g., "v0.12.0")
1616
buildDate = "unknown" // ISO 8601 build timestamp
1717
commitSHA = "unknown" // Git commit SHA
1818
)

docs/concepts/security-model.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@ Secrets is designed for practical defense-in-depth around secret storage and cry
2020
## 🎫 Tokenization security considerations
2121

2222
- Metadata is not encrypted: do not place full PAN, credentials, or regulated payloads in token metadata.
23-
- Deterministic tokenization leaks equality patterns for identical plaintext under the same active key.
23+
- Deterministic tokenization now uses per-key version salts and HMAC-SHA256 keyed hashing to prevent rainbow table attacks.
24+
- Equality patterns for identical plaintext are still visible *under the same active key version* when deterministic mode is enabled.
2425
- TTL expiration and revocation both invalidate token usage, but neither should replace endpoint authorization.
2526
- Detokenization is plaintext exposure: isolate clients with `decrypt` capability and avoid shared broad policies.
2627
- Expired tokens should be cleaned on cadence (`clean-expired-tokens`) to reduce stale sensitive mappings.

docs/engines/tokenization.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,14 @@ Example response (`200 OK`):
101101
- `GET /v1/tokenization/keys` (Capability: `read`)
102102
- `DELETE /v1/tokenization/keys/:id` (Capability: `delete`)
103103

104+
## Deterministic Tokenization
105+
106+
When `is_deterministic` is set to `true`, the engine ensures that the same plaintext value always produces the same token *under the same key version*.
107+
108+
- **Security**: To prevent rainbow table attacks, each key version generates a unique random 32-byte salt. The engine uses HMAC-SHA256 with this salt to compute a unique hash for each plaintext.
109+
- **Equality Matching**: This mode allows for equality matching and duplicate detection within your application without exposing the sensitive plaintext.
110+
- **Rotation**: When a key is rotated, a new salt is generated. Identical plaintext tokenized under the new version will produce a different token than the previous version.
111+
104112
## Relevant CLI Commands
105113

106114
- `rewrap-deks`: Rewraps tokenization key DEKs when rotating the KEK.

docs/examples/curl.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -171,8 +171,9 @@ curl -X POST "$BASE_URL/v1/tokenization/detokenize" \
171171

172172
Deterministic caveat:
173173

174-
- When `is_deterministic=true`, tokenizing the same plaintext with the same active key can return the same token
175-
- Prefer non-deterministic mode unless you explicitly need equality matching
174+
- When `is_deterministic=true`, the engine uses per-key version salts and HMAC-SHA256 to prevent rainbow table attacks.
175+
- Identical plaintext tokenized with the same active key version will return the same token.
176+
- Prefer non-deterministic mode unless you explicitly need equality matching.
176177

177178
## Common Mistakes
178179

docs/examples/go.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,8 @@ func tokenizationFlow(token string) error {
254254

255255
Deterministic caveat:
256256

257-
- Keys configured as deterministic can emit the same token for the same plaintext under the same active key.
257+
- Keys configured as deterministic use per-key version salts and HMAC-SHA256 to prevent rainbow table attacks.
258+
- They can emit the same token for the same plaintext under the same active key version.
258259
- Use deterministic mode only when your workflow requires equality matching.
259260

260261
Rate-limit note:

docs/examples/javascript.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,8 @@ async function tokenizationFlow(token) {
165165

166166
Deterministic caveat:
167167

168-
- With `is_deterministic: true`, tokenizing the same plaintext with the same active key can produce the same token.
168+
- With `is_deterministic: true`, the engine uses per-key version salts and HMAC-SHA256 to prevent rainbow table attacks.
169+
- Identical plaintext tokenized with the same active key version will produce the same token.
169170
- Prefer non-deterministic mode unless stable equality matching is required.
170171

171172
Rate-limit note:

docs/examples/python.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,8 @@ def tokenize_detokenize(token: str) -> None:
147147

148148
Deterministic caveat:
149149

150-
- If you create a key with `is_deterministic=True`, repeated tokenization of identical plaintext can return the same token.
150+
- If you create a key with `is_deterministic=True`, the engine uses per-key version salts and HMAC-SHA256 to prevent rainbow table attacks.
151+
- Repeated tokenization of identical plaintext under the same active key version will return the same token.
151152
- Use deterministic mode only when equality matching is a functional requirement.
152153

153154
Rate-limit note:

internal/tokenization/domain/errors.go

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,14 @@ var (
4141
// ErrTokenizationKeyNameEmpty indicates the tokenization key name is empty.
4242
ErrTokenizationKeyNameEmpty = errors.Wrap(errors.ErrInvalidInput, "tokenization key name cannot be empty")
4343

44-
// ErrTokenizationKeyVersionInvalid indicates the version is invalid (must be > 0).
44+
// ErrTokenizationKeySaltInvalid indicates the tokenization key salt is invalid.
45+
ErrTokenizationKeySaltInvalid = errors.Wrap(
46+
errors.ErrInvalidInput,
47+
"tokenization key salt cannot be empty for deterministic keys",
48+
)
49+
50+
// ErrTokenizationKeyVersionInvalid indicates the tokenization key version is invalid.
51+
4552
ErrTokenizationKeyVersionInvalid = errors.Wrap(
4653
errors.ErrInvalidInput,
4754
"tokenization key version must be greater than 0",

internal/tokenization/domain/tokenization_key.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@ type TokenizationKey struct {
2727
// When true, enables efficient duplicate detection; when false, provides better privacy.
2828
IsDeterministic bool
2929

30+
// Salt is used for deterministic tokenization to prevent rainbow table attacks.
31+
Salt []byte
32+
3033
// DekID is the reference to the Data Encryption Key used to encrypt values for this version.
3134
DekID uuid.UUID
3235

@@ -49,6 +52,9 @@ func (tk *TokenizationKey) Validate() error {
4952
if err := tk.FormatType.Validate(); err != nil {
5053
return ErrInvalidFormatType
5154
}
55+
if tk.IsDeterministic && len(tk.Salt) == 0 {
56+
return ErrTokenizationKeySaltInvalid
57+
}
5258
if tk.DekID == uuid.Nil {
5359
return ErrTokenizationKeyDekIDInvalid
5460
}

0 commit comments

Comments
 (0)