Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .chloggen/redaction-hmac-support.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: "enhancement"

# The name of the component, or a single word describing the area of concern, (e.g. receiver/filelog)
component: "processor/redaction"

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "Add HMAC hash functions (`hmac-sha256` and `hmac-sha512`) for GDPR-compliant pseudonymization of sensitive data like IP addresses"

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [45715]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
HMAC functions provide rainbow table resistant hashing by using a secret key, making it impossible to reverse-engineer original values without the key.
This enables true pseudonymization per GDPR Article 4(5) requirements while maintaining consistency for pattern analysis.
Configure with `hash_function: hmac-sha256` (or `hmac-sha512`) and `hmac_key: "${env:REDACTION_SECRET_KEY}"`.

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
84 changes: 81 additions & 3 deletions processor/redactionprocessor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,85 @@ The value is then masked according to the configuration.
`hash_function` defines the function for hashing values of matched keys or matches in values
instead of masking them with a fixed string. By default, no hash function is used
and masking with a fixed string is performed. The supported hash functions
are `md5`, `sha1` and `sha3` (SHA-256).
are `md5`, `sha1`, `sha3` (SHA-256), `hmac-sha256`, and `hmac-sha512`.

### HMAC Hash Functions

For enhanced security, especially when dealing with low-entropy data like IP addresses, HMAC (Hash-based Message Authentication Code) hash functions are recommended over simple hash functions like MD5, SHA1, or SHA3.

**Why HMAC?**

Simple hash functions are vulnerable to rainbow table attacks for low-entropy data:
- IPv4 address space: only 2^32 ≈ 4.3 billion possible values
- Attackers can pre-compute all possible IPv4 hashes to reverse the hashing

HMAC uses a secret key, making it practically impossible to:
- Reverse-engineer the original value without the key
- Use pre-computed rainbow tables
- Brute-force the hash even if the algorithm is known

**Benefits:**
- ✅ Consistency: Same input + same key = same output (required for pattern analysis)
- ✅ Irreversibility: Cannot reverse without the secret key
- ✅ Rainbow table resistant: Pre-computed hash tables are useless
- ✅ GDPR compliant: Meets true pseudonymization requirements per Article 4(5)

**Configuration Example:**

```yaml
processors:
redaction:
allow_all_keys: true
blocked_values:
- "(?:[0-9]{1,3}\\.){3}[0-9]{1,3}" # IPv4 addresses
- "(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}" # IPv6 addresses
hash_function: hmac-sha256 # or hmac-sha512
hmac_key: "${env:REDACTION_SECRET_KEY}" # Load from environment variable
summary: silent
```

**Key Management:**

```bash
# Generate a strong random key (do this once and store securely)
export REDACTION_SECRET_KEY=$(openssl rand -hex 32)

# Use the key when running the collector
./otelcol-contrib --config=config.yaml

# For production, store keys in:
# - Kubernetes Secrets
# - HashiCorp Vault
# - AWS Secrets Manager
# - Azure Key Vault
# Never commit keys to version control!
```

**Security Notes:**
- Use at least 256-bit (32-byte) random keys for HMAC-SHA256
- Use at least 512-bit (64-byte) random keys for HMAC-SHA512
- Store keys separately from log data
- Rotate keys periodically according to your security policy
- Document which key version was used for each time period
- HMAC-SHA256 provides sufficient security for most use cases
- HMAC-SHA512 offers additional security margin with minimal performance cost (~10-20% CPU overhead vs simple hashes)

**Key Validation:**

The processor automatically validates HMAC keys at startup:
- HMAC-SHA256 requires keys of at least 32 bytes (256 bits)
- HMAC-SHA512 requires keys of at least 64 bytes (512 bits)
- Empty keys are not allowed when HMAC hash functions are configured
- Configuration will fail if the key doesn't meet minimum requirements

This ensures that weak keys cannot be used accidentally, maintaining the security guarantees of HMAC hashing.

**GDPR Compliance:**

HMAC satisfies GDPR Article 4(5) pseudonymization requirements:
- Without the key, personal data cannot be attributed to a specific data subject
- Provides technical measures to ensure data protection
- Key and data are stored separately

The `url_sanitizer` configuration enables sanitization of URLs in specified attributes by removing potentially sensitive information like UUIDs, timestamps, and other non-essential path segments. This is particularly useful for reducing cardinality in telemetry data while preserving the essential parts of URLs for troubleshooting.

Expand Down Expand Up @@ -200,7 +278,7 @@ Example configuration with database sanitization:
processors:
redaction:
# ... other redaction settings ...

# Database sanitization configuration
db_sanitizer:
# sanitize_span_name controls whether span names should be sanitized for database queries (default: true)
Expand All @@ -215,7 +293,7 @@ processors:
attributes: ["db.statement", "redis.command"]
memcached:
enabled: true
attributes: ["db.statement", "memcached.command"]
attributes: ["db.statement", "memcached.command"]
mongo:
enabled: true
attributes: ["db.statement", "mongodb.query"]
Expand Down
48 changes: 43 additions & 5 deletions processor/redactionprocessor/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ import (
"fmt"
"strings"

"go.opentelemetry.io/collector/config/configopaque"

"github.com/open-telemetry/opentelemetry-collector-contrib/processor/redactionprocessor/internal/db"
"github.com/open-telemetry/opentelemetry-collector-contrib/processor/redactionprocessor/internal/url"
)
Expand All @@ -18,10 +20,12 @@ var _ encoding.TextUnmarshaler = (*HashFunction)(nil)
type HashFunction string

const (
None HashFunction = ""
SHA1 HashFunction = "sha1"
SHA3 HashFunction = "sha3"
MD5 HashFunction = "md5"
None HashFunction = ""
SHA1 HashFunction = "sha1"
SHA3 HashFunction = "sha3"
MD5 HashFunction = "md5"
HMACSHA256 HashFunction = "hmac-sha256"
HMACSHA512 HashFunction = "hmac-sha512"
)

type Config struct {
Expand All @@ -44,6 +48,11 @@ type Config struct {
// and masking with a fixed string is performed.
HashFunction HashFunction `mapstructure:"hash_function"`

// HMACKey is the secret key used for HMAC hashing when HashFunction is set to hmac-sha256 or hmac-sha512.
// This should be loaded from a secure source like environment variables.
// Minimum length: 32 bytes for HMAC-SHA256, 64 bytes for HMAC-SHA512.
HMACKey configopaque.String `mapstructure:"hmac_key"`

// IgnoredKeys is a list of span attribute keys that are not redacted.
// Span attributes in this list are allowed to pass through the filter
// without being changed or removed.
Expand Down Expand Up @@ -101,9 +110,38 @@ func (u *HashFunction) UnmarshalText(text []byte) error {
case strings.ToLower(SHA3.String()):
*u = SHA3
return nil
case strings.ToLower(HMACSHA256.String()):
*u = HMACSHA256
return nil
case strings.ToLower(HMACSHA512.String()):
*u = HMACSHA512
return nil
case strings.ToLower(None.String()):
*u = None
return nil
}
return fmt.Errorf("unknown HashFunction %s, allowed functions are %s, %s and %s", str, SHA1, SHA3, MD5)
return fmt.Errorf("unknown HashFunction %s, allowed functions are %s, %s, %s, %s and %s", str, SHA1, SHA3, MD5, HMACSHA256, HMACSHA512)
}

// Validate validates the configuration
func (cfg *Config) Validate() error {
// Validate HMAC key requirements
if cfg.HashFunction == HMACSHA256 || cfg.HashFunction == HMACSHA512 {
key := string(cfg.HMACKey)
if key == "" {
return fmt.Errorf("hmac_key must not be empty when hash_function is %s", cfg.HashFunction)
}

// Enforce minimum key lengths for security
minLength := 32
if cfg.HashFunction == HMACSHA512 {
minLength = 64
}

if len(key) < minLength {
return fmt.Errorf("hmac_key must be at least %d bytes long for %s, got %d bytes", minLength, cfg.HashFunction, len(key))
}
}

return nil
}
101 changes: 101 additions & 0 deletions processor/redactionprocessor/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -107,3 +107,104 @@ func TestValidateConfig(t *testing.T) {
})
}
}

func TestValidateHMACKey(t *testing.T) {
tests := []struct {
name string
config *Config
expectError bool
errorContains string
}{
{
name: "valid HMAC-SHA256 with sufficient key length",
config: &Config{
HashFunction: HMACSHA256,
HMACKey: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", // 32 bytes
},
expectError: false,
},
{
name: "valid HMAC-SHA512 with sufficient key length",
config: &Config{
HashFunction: HMACSHA512,
HMACKey: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", // 64 bytes
},
expectError: false,
},
{
name: "empty key with HMAC-SHA256",
config: &Config{
HashFunction: HMACSHA256,
HMACKey: "",
},
expectError: true,
errorContains: "hmac_key must not be empty",
},
{
name: "empty key with HMAC-SHA512",
config: &Config{
HashFunction: HMACSHA512,
HMACKey: "",
},
expectError: true,
errorContains: "hmac_key must not be empty",
},
{
name: "key too short for HMAC-SHA256",
config: &Config{
HashFunction: HMACSHA256,
HMACKey: "short-key",
},
expectError: true,
errorContains: "hmac_key must be at least 32 bytes long",
},
{
name: "key too short for HMAC-SHA512",
config: &Config{
HashFunction: HMACSHA512,
HMACKey: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", // 32 bytes, too short for SHA512
},
expectError: true,
errorContains: "hmac_key must be at least 64 bytes long",
},
{
name: "no validation for non-HMAC hash functions",
config: &Config{
HashFunction: MD5,
HMACKey: "",
},
expectError: false,
},
{
name: "no validation when hash function is None",
config: &Config{
HashFunction: None,
HMACKey: "",
},
expectError: false,
},
{
name: "key with special characters is allowed",
config: &Config{
HashFunction: HMACSHA256,
HMACKey: "!@#$%^&*()_+-=[]{}|;:,.<>?",
},
expectError: true,
errorContains: "hmac_key must be at least 32 bytes long",
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := tt.config.Validate()
if tt.expectError {
assert.Error(t, err)
if tt.errorContains != "" {
assert.Contains(t, err.Error(), tt.errorContains)
}
} else {
assert.NoError(t, err)
}
})
}
}
1 change: 1 addition & 0 deletions processor/redactionprocessor/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ require (
github.com/stretchr/testify v1.11.1
go.opentelemetry.io/collector/component v1.50.1-0.20260121161034-55399d4743af
go.opentelemetry.io/collector/component/componenttest v0.144.1-0.20260121161034-55399d4743af
go.opentelemetry.io/collector/config/configopaque v1.50.1-0.20260121161034-55399d4743af
go.opentelemetry.io/collector/confmap v1.50.1-0.20260121161034-55399d4743af
go.opentelemetry.io/collector/confmap/xconfmap v0.144.1-0.20260121161034-55399d4743af
go.opentelemetry.io/collector/consumer v1.50.1-0.20260121161034-55399d4743af
Expand Down
2 changes: 2 additions & 0 deletions processor/redactionprocessor/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions processor/redactionprocessor/processor.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,19 @@ package redactionprocessor // import "github.com/open-telemetry/opentelemetry-co
//nolint:gosec
import (
"context"
"crypto/hmac"
"crypto/md5"
"crypto/sha1"
"crypto/sha256"
"crypto/sha512"
"encoding/hex"
"fmt"
"hash"
"regexp"
"sort"
"strings"

"go.opentelemetry.io/collector/config/configopaque"
"go.opentelemetry.io/collector/pdata/pcommon"
"go.opentelemetry.io/collector/pdata/plog"
"go.opentelemetry.io/collector/pdata/pmetric"
Expand Down Expand Up @@ -398,6 +402,10 @@ func (s *redaction) maskValue(val string, regex *regexp.Regexp) string {
return hashString(match, sha3.New256())
case MD5:
return hashString(match, md5.New())
case HMACSHA256:
return hashStringHMAC(match, s.config.HMACKey, sha256.New)
case HMACSHA512:
return hashStringHMAC(match, s.config.HMACKey, sha512.New)
default:
return "****"
}
Expand All @@ -410,6 +418,12 @@ func hashString(input string, hasher hash.Hash) string {
return hex.EncodeToString(hasher.Sum(nil))
}

func hashStringHMAC(input string, key configopaque.String, newHash func() hash.Hash) string {
h := hmac.New(newHash, []byte(string(key)))
h.Write([]byte(input))
return hex.EncodeToString(h.Sum(nil))
}

// addMetaAttrs adds diagnostic information about redacted or masked attribute keys
func (s *redaction) addMetaAttrs(redactedAttrs []string, attributes pcommon.Map, valuesAttr, countAttr string) {
redactedCount := int64(len(redactedAttrs))
Expand Down
Loading