You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: processor/redactionprocessor/README.md
+70-3Lines changed: 70 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -164,7 +164,74 @@ The value is then masked according to the configuration.
164
164
`hash_function`defines the function for hashing values of matched keys or matches in values
165
165
instead of masking them with a fixed string. By default, no hash function is used
166
166
and masking with a fixed string is performed. The supported hash functions
167
-
are `md5`, `sha1` and `sha3` (SHA-256).
167
+
are `md5`, `sha1`, `sha3` (SHA-256), `hmac-sha256`, and `hmac-sha512`.
168
+
169
+
### HMAC Hash Functions
170
+
171
+
For enhanced security, especially when dealing with low-entropy data like IP addresses, HMAC (Hash-based Message Authentication Code) hash functions are recommended over simple hash functions like MD5, SHA1, or SHA3.
172
+
173
+
**Why HMAC?**
174
+
175
+
Simple hash functions are vulnerable to rainbow table attacks for low-entropy data:
176
+
- IPv4 address space: only 2^32 ≈ 4.3 billion possible values
177
+
- Attackers can pre-compute all possible IPv4 hashes to reverse the hashing
178
+
179
+
HMAC uses a secret key, making it practically impossible to:
180
+
- Reverse-engineer the original value without the key
181
+
- Use pre-computed rainbow tables
182
+
- Brute-force the hash even if the algorithm is known
183
+
184
+
**Benefits:**
185
+
- ✅ Consistency: Same input + same key = same output (required for pattern analysis)
186
+
- ✅ Irreversibility: Cannot reverse without the secret key
187
+
- ✅ Rainbow table resistant: Pre-computed hash tables are useless
- Without the key, personal data cannot be attributed to a specific data subject
233
+
- Provides technical measures to ensure data protection
234
+
- Key and data are stored separately
168
235
169
236
The `url_sanitizer` configuration enables sanitization of URLs in specified attributes by removing potentially sensitive information like UUIDs, timestamps, and other non-essential path segments. This is particularly useful for reducing cardinality in telemetry data while preserving the essential parts of URLs for troubleshooting.
170
237
@@ -200,7 +267,7 @@ Example configuration with database sanitization:
200
267
processors:
201
268
redaction:
202
269
# ... other redaction settings ...
203
-
270
+
204
271
# Database sanitization configuration
205
272
db_sanitizer:
206
273
# sanitize_span_name controls whether span names should be sanitized for database queries (default: true)
0 commit comments