layout	default
title	Outbound Sensitive Data Detection

Outbound Sensitive Data Detection

Status: Research / Future Roadmap
Priority: Medium
Depends on: Channel interception layer (optional)

Problem Statement

When agents send outbound messages (responses to users), they may inadvertently include:

API keys or access tokens
Private credentials or passwords
Internal URLs or infrastructure details
PII (names, emails, phone numbers, SSNs)
High-entropy secrets (JWTs, database connection strings)

Current implementation removed outbound scanning from the adversary-detector crate to simplify the initial channel integration. This document captures the research directions for re-implementing outbound content filtering.

Detection Approaches

1. High Entropy Detection

Technique: Shannon entropy calculation on strings
Thresholds: >4.5 bits/char for base64, >5.5 for hex
Pros: Catches unknown secret formats, fast
Cons: False positives on compressed data, random IDs, UUIDs
Mitigation: Combine with pattern matching, allowlist common formats

2. Regex Pattern Matching

API Keys: sk-[a-zA-Z0-9]{32,}, AKIA[0-9A-Z]{16}
Tokens: eyJ[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]* (JWT)
Private Keys: -----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----
Connection Strings: mongodb(\+srv)?://, postgres://, mysql://
Credit Cards: Luhn-validated 13-19 digit sequences
SSNs: \d{3}-\d{2}-\d{4} (with contextual keywords)

3. Regret Matches

Concept: "I regret including..." — contextual patterns
Examples:
- "my password is", "password:" + high-entropy string
- "api key:", "token:", "secret:" + following value
- "don't share this", "private:" + content
Pros: High precision when context is clear
Cons: Misses secrets without contextual hints

4. Machine Learning Classifiers

Approach: Fine-tuned transformer for secret detection
Training Data: GitHub secret scanning public dataset
Pros: Generalizes to new secret types
Cons: Latency, compute cost, false positive tuning

5. Dictionary/Allowlist Approach

Blocklist: Known dangerous patterns (private IP ranges, localhost URLs)
Allowlist: Safe patterns (public docs, example.com)
Greylist: Flag for review (internal hostnames, VPN IPs)

Implementation Design

Configuration

security:
  outbound_scanning:
    enabled: true
    mode: "flag"  # "block", "flag", "log_only"
    detectors:
      high_entropy:
        enabled: true
        min_entropy: 4.5
        min_length: 16
      patterns:
        enabled: true
        patterns_file: "secrets-patterns.json"
      context_keywords:
        enabled: true
        keywords: ["password", "secret", "token", "key", "credential"]
    redaction:
      enabled: true
      mask: "***REDACTED***"
    alerts:
      on_detection: true
      channel: "signal"
      to: "+1XXXXXXXXXX"

Integration Points

Channel Layer (Calciforge)
- Scan agent responses before transmission
- Configurable per-channel (DMs vs groups)
- Respect user trust levels
Tool Result Layer (OpenClaw)
- Continue scanning tool outputs (existing)
- Extend to tool inputs (prevent exfiltration)
Policy Integration (clash)
- Policy rule: outbound_contains_secrets → block
- Audit logging for compliance

Open Questions

Performance: Can we scan without adding >100ms latency to responses?
Context Awareness: Should trusted identities (owner) bypass scanning?
Redaction vs Blocking: Redact and send, or block and alert?
Learning: Should the system learn from false positive reports?
Scope: Just agent responses, or also tool call arguments?

Related Work

GitHub Secret Scanning: 100+ partner patterns, public dataset
Gitleaks: Open-source secret scanner, Go-based, fast
TruffleHog: Entropy + regex, enterprise-grade
AWS Macie: ML-based PII detection for S3

Next Steps

Research Phase:
- Evaluate gitleaks/trufflehog patterns for Rust port
- Test entropy thresholds on real agent outputs
- Survey: what secrets have leaked in practice?
Prototype Phase:
- Implement entropy + regex scanner
- Test on 1000+ agent responses for false positive rate
- Build configuration schema
Integration Phase:
- Wire into channel transmission pipeline
- Add to clash policy engine
- Deploy behind feature flag

Risks

False positives: Blocking legitimate content frustrates users
Privacy: Scanner sees all outbound content — audit logging must be minimal
Evasion: Attackers may obfuscate secrets (base64, rot13, character substitution)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outbound Sensitive Data Detection

Problem Statement

Detection Approaches

1. High Entropy Detection

2. Regex Pattern Matching

3. Regret Matches

4. Machine Learning Classifiers

5. Dictionary/Allowlist Approach

Implementation Design

Configuration

Integration Points

Open Questions

Related Work

Next Steps

Risks

References

FilesExpand file tree

outbound-sensitive-data-detection.md

Latest commit

History

outbound-sensitive-data-detection.md

File metadata and controls

Outbound Sensitive Data Detection

Problem Statement

Detection Approaches

1. High Entropy Detection

2. Regex Pattern Matching

3. Regret Matches

4. Machine Learning Classifiers

5. Dictionary/Allowlist Approach

Implementation Design

Configuration

Integration Points

Open Questions

Related Work

Next Steps

Risks

References