Skip to content

Latest commit

 

History

History
277 lines (199 loc) · 8.41 KB

File metadata and controls

277 lines (199 loc) · 8.41 KB

Interactive Sanitization

Interactive review is always enabled. After auto-sanitization, suspicious values that couldn't be automatically classified are presented for your review. This catches edge cases like WiFi SSIDs, device names, or credentials that don't match standard patterns.

Quick Start

# Sanitize and review flagged values
har-capture sanitize device.har

# Also save a detailed JSON report
har-capture sanitize device.har --report sanitization-report.json

How It Works

Sanitization happens in two passes:

Pass 1: Auto-Redact and Flag

  • Automatically redacts known patterns (MAC addresses, IPs, passwords, etc.)
  • Flags suspicious values that need manual review using heuristics
  • Generates a report with flagged values and context

Pass 2: Apply User Decisions

  • You review flagged values and decide which to redact
  • Selected values are redacted via global find-replace
  • Salt is preserved to ensure consistent hashing across both passes

Flagging Heuristics

Values are flagged based on several detection methods:

Heuristic Example Confidence Description
SSID-like strings HomeNetwork-5G, Office-Guest HIGH/MEDIUM 3-32 characters with patterns like Name-5G, WiFi-Guest
Device names Johns-iPhone, Living-Room-TV MEDIUM Contains possessives, device types, or room names
High entropy MyS3cur3Pass, P@ssw0rd! MEDIUM 8+ characters with mixed case, numbers, special chars
Adjacent to redacted Value next to MAC_xxxxx or PASS_xxxxx HIGH Appears before/after an already-redacted value

Safe Values (Never Flagged)

Common technical values are automatically considered safe:

  • Status values: Good, Bad, OK, Error, Enabled, Disabled
  • Security types: WPA, WPA2, WPA3, Open
  • Band indicators: 2.4g, 5g, 6g
  • Numbers, versions, timestamps, percentages
  • Already redacted placeholders

Interactive Review UI

When flagged values are found, you'll see a table for review:

┌─ Flagged Values for Review ─────────────────────────────────────────┐
│  #  Match  Type            Value                  Context           │
├─────────────────────────────────────────────────────────────────────┤
│  1    ●    📶 WiFi SSID    HomeNetwork-5G        ...Good|>>>Home... │
│  2    ●    📶 WiFi SSID    HomeNetwork-2G        ...5g|>>>HomeNe... │
│  3    ●    🔑 Password     secret123             ...5G|>>>secret... │
│  4    ●    📱 Device       Johns-iPhone (2x)     ...>>>Johns-iP...  │
└─────────────────────────────────────────────────────────────────────┘

Quick Actions

Choose how to handle flagged values:

  • 📋 Select individually - Review each item with checkboxes
  • ✅ Redact ALL flagged values - Redact everything
  • 🔴 Redact HIGH confidence only - Only redact high-confidence items
  • 🔴🟡 Redact HIGH + MEDIUM - Skip low-confidence items
  • ⏭️ Skip review - Keep all values as-is

Individual Selection

Use checkboxes to select specific values to redact:

  • ↑↓ - Navigate
  • Space - Toggle selection
  • a - Select all
  • n - Deselect all
  • Enter - Confirm
  • Backspace/ESC - Go back

Passwords/credentials are pre-selected by default, but you can adjust any selection.

Examples

Basic Interactive Review

har-capture sanitize device.har

Output:

Sanitizing device.har...
  Auto-redacted: 47 values

Found 5 suspicious values that may need review:
[interactive table shows here]

How would you like to handle flagged values?
> 📋 Select individually (5 items)

[checkbox interface shows here]

✓ Marked 4 items for redaction, skipped 1.

Review complete!
  Auto-redacted: 47
  User redacted: 4
  User skipped: 1

Sanitized: device.sanitized.har

Generate Report for Later Review

har-capture sanitize device.har --report review.json

The report contains:

{
  "input_file": "device.har",
  "output_file": "device.sanitized.har",
  "salt": "a1b2c3d4e5f6",  # pragma: allowlist secret
  "summary": {
    "auto_redacted": 47,
    "user_redacted": 0,
    "user_skipped": 0
  },
  "auto_redacted_counts": {
    "mac_address": 12,
    "private_ip": 15,
    "password": 8
  },
  "flagged": [
    {
      "value": "HomeNetwork-5G",
      "category": "wifi_ssid",
      "confidence": "high",
      "reason": "SSID-like pattern: ends with band/extension suffix",
      "occurrences": 2,
      "status": "flagged"
    }
  ]
}

Non-TTY Environment (CI/CD)

When no terminal is available, flagged values are written to a report file instead of prompting:

har-capture sanitize device.har 2>&1
# Note: No terminal detected. Writing flagged values to report instead.
# Sanitized: device.sanitized.har
# Report: device.har.review.json

Edge Cases Handled

No Suspicious Values

Sanitizing device.har...
  Auto-redacted: 47 values
  Flagged for review: 0

✓ No suspicious values found. All values were handled automatically.
Sanitized: device.sanitized.har

Common Word Warnings

If a flagged value appears many times, you'll see a warning:

  WARNING: "Guest" appears 47 times in the file.
  Redacting will replace ALL occurrences.

This helps you make informed decisions about common words.

Already Sanitized Files

Warning: This file appears to already be sanitized.
  Found redaction placeholders (MAC_xxxxx, PASS_xxxxx, etc.)
  Proceeding may double-hash already redacted values.

Continue? [y/N]

Python API

You can also use interactive sanitization programmatically:

from har_capture.sanitization import sanitize_har, apply_user_redactions
from har_capture.sanitization.report import HeuristicMode
from har_capture.cli.interactive import run_interactive_review
import json

# Pass 1: Auto-redact and flag suspicious values
with open("device.har") as f:
    har_data = json.load(f)

sanitized, report = sanitize_har(har_data, heuristics=HeuristicMode.FLAG)

# Interactively review (if there are flagged values)
if report.flagged:
    run_interactive_review(report)

# Pass 2: Apply user decisions
if report.total_user_redacted > 0:
    final = apply_user_redactions(sanitized, report)
else:
    final = sanitized

# Save result
with open("device.sanitized.har", "w") as f:
    json.dump(final, f, indent=2)

Salt Preservation

The salt used for hashing is stored in the report and reused in Pass 2. This ensures:

  • Consistent hashing across both passes
  • Same sensitive value produces same placeholder (MAC_abc123ef)
  • Correlation is preserved (same MAC address always becomes the same placeholder)

Tips

  1. High confidence items are usually safe to redact (e.g., values next to passwords)
  2. Medium confidence often includes SSIDs and device names - review the context
  3. Low confidence items need careful review to avoid false positives
  4. Use the occurrence count to assess impact - redacting "test" that appears 100 times might affect legitimate content
  5. ESC/Backspace lets you go back and change your mind
  6. Ctrl+C cancels the review (file is still sanitized, but flagged values remain unchanged)

Troubleshooting

Terminal encoding issues

If you see garbled characters, ensure your terminal supports UTF-8:

export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8

InquirerPy errors

If the interactive prompts crash, the tool falls back gracefully and writes a report file instead.

Performance with large HAR files

For very large HAR files (>50MB), consider using --report to review flagged values separately rather than doing it interactively.

Related Documentation