Skip to content

Commit 5719d3a

Browse files
kwschulzclaude
andcommitted
feat: v0.2.0 - CLI overhaul and security improvements
BREAKING CHANGES: - Rename CLI command: `device` → `get` - Raw HAR files deleted by default after sanitization (use --keep-raw to retain) Features: - Add --keep-raw flag to preserve unsanitized HAR files - Generalize CLI for any URL/hostname (not just devices) Docs: - Rewrite Quick Start with platform-specific tabs (Windows/macOS/Linux) - Add DevTools to comparison table, acknowledge Chrome v130+ sanitization - Remove Modules section (API docs belong in code) - Update all examples to use `har-capture get <TARGET>` Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 6568dcc commit 5719d3a

7 files changed

Lines changed: 129 additions & 152 deletions

File tree

README.md

Lines changed: 69 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,33 @@
66
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
77
[![AI Assisted](https://img.shields.io/badge/AI-Claude%20Assisted-5A67D8.svg)](https://claude.ai)
88

9-
Capture and sanitize [HAR (HTTP Archive)](http://www.softwareishard.com/blog/har-12-spec/) files for network traffic analysis. HAR files record browser network activity and are commonly used for debugging, diagnostics, and test fixtures.
9+
Capture and sanitize [HAR (HTTP Archive)](http://www.softwareishard.com/blog/har-12-spec/) files. HAR files record browser HTTP activity and are commonly used for debugging, diagnostics, and test fixtures.
1010

1111
## Quick Start
1212

13+
<details open>
14+
<summary><b>Windows</b></summary>
15+
16+
1. Install Python from the [Microsoft Store](https://apps.microsoft.com/detail/9NRWMJP3717K) or [python.org](https://www.python.org/downloads/)
17+
2. Open PowerShell and run:
18+
1319
```bash
1420
pip install har-capture[full]
15-
har-capture capture 192.168.100.1
21+
python -m har_capture get https://example.com
1622
```
1723

24+
</details>
25+
26+
<details>
27+
<summary><b>macOS / Linux</b></summary>
28+
29+
```bash
30+
pip install har-capture[full]
31+
har-capture get https://example.com
32+
```
33+
34+
</details>
35+
1836
<details>
1937
<summary><b>Already have a HAR file?</b></summary>
2038

@@ -41,37 +59,42 @@ sanitized = sanitize_har(har_data)
4159

4260
## Why har-capture?
4361

44-
Existing HAR sanitization tools require a **manual, multi-step workflow**:
62+
[Chrome DevTools v130+](https://developer.chrome.com/blog/new-in-devtools-130) now sanitizes cookies and auth headers by default when exporting HAR files. That's a good start, but HAR files contain much more sensitive data:
4563

46-
1. Open browser DevTools
47-
2. Record network traffic
48-
3. Export HAR file
49-
4. Find a sanitizer tool
50-
5. Upload, process, download
64+
- IP addresses, MAC addresses, email addresses
65+
- Passwords and credentials in form bodies
66+
- Serial numbers, device names, session tokens
5167

52-
**har-capture** provides an **integrated, CLI-first approach**:
68+
**har-capture** provides **deep sanitization** and **CLI automation**:
5369

5470
```bash
55-
har-capture capture <DEVICE_IP> # Capture + sanitize in one step
71+
har-capture get <TARGET> # Capture + sanitize in one step
5672
```
5773

5874
### Comparison with Existing Tools
5975

60-
| Feature | har-capture | [Google](https://github.com/google/har-sanitizer) | [Cloudflare](https://blog.cloudflare.com/introducing-har-sanitizer-secure-har-sharing/) | [Edgio](https://github.com/Edgio/har-tools) |
61-
|---------|-------------|--------|------------|-------|
62-
| Automated browser capture | **Yes** | No | No | No |
63-
| CLI-first design | **Yes** | No (Flask API) | No (Web UI) | No (Web UI) |
64-
| Integrated capture+sanitize | **Yes** | No | No | No |
65-
| Correlation-preserving redaction | **Yes** | No | No | No |
66-
| Device-specific PII patterns | **Yes** | Generic | JWT-focused | Generic |
67-
| Zero-dependency core | **Yes** | No | No | No |
68-
| Custom pattern support | **Yes** | No | No | No |
69-
| Cross-platform CLI | **Yes** | No | No | No |
76+
| Feature | har-capture | [DevTools](https://developer.chrome.com/docs/devtools/network/reference) | [Google](https://github.com/google/har-sanitizer) | [Cloudflare](https://blog.cloudflare.com/introducing-har-sanitizer-secure-har-sharing/) | [Edgio](https://github.com/Edgio/har-tools) |
77+
|---------|-------------|----------|--------|------------|-------|
78+
| **Sanitization** |
79+
| Cookies/auth headers | Yes | Yes | Yes | Yes | Yes |
80+
| IPs, MACs, emails | **Yes** | No | No | No | No |
81+
| Passwords in forms | **Yes** | No | Yes | No | Yes |
82+
| JWT smart redaction | No | No | No | **Yes** | No |
83+
| Correlation-preserving | **Yes** | No | No | No | No |
84+
| **Usability** |
85+
| No installation needed | No | **Yes** | No | **Yes** | **Yes** |
86+
| Data stays local | **Yes** | **Yes** | No | **Yes** | **Yes** |
87+
| CLI/scriptable | **Yes** | No | Yes | No | Yes |
88+
| Preview before redact | Yes | No | **Yes** | No | No |
89+
| **Extras** |
90+
| Integrated capture | **Yes** | **Yes** | No | No | No |
91+
| Custom patterns | **Yes** | No | Yes | No | No |
92+
| Validation | **Yes** | No | No | No | No |
7093

7194
### Target Use Cases
7295

7396
- **Support diagnostics**: Users submit sanitized HAR files without exposing credentials
74-
- **Parser development**: Capture device web interfaces for building integrations
97+
- **Web development**: Capture and analyze HTTP traffic for debugging
7598
- **Test fixtures**: Generate reproducible traffic captures for testing
7699
- **Security review**: Validate HAR files for PII leaks before sharing
77100

@@ -120,29 +143,29 @@ clean_html = sanitize_html(raw_html, salt=None)
120143

121144
# Sanitize HAR file
122145
from har_capture.sanitization import sanitize_har_file
123-
sanitize_har_file("device.har") # Creates device.sanitized.har
146+
sanitize_har_file("capture.har") # Creates capture.sanitized.har
124147
```
125148

126149
### CLI
127150

128151
```bash
129-
# Capture device traffic
130-
har-capture capture <DEVICE_IP>
152+
# Capture HTTP traffic
153+
har-capture get <TARGET>
131154

132155
# Sanitize a HAR file (uses random salt by default)
133-
har-capture sanitize device.har
156+
har-capture sanitize capture.har
134157

135158
# Sanitize with consistent salt
136-
har-capture sanitize device.har --salt my-key
159+
har-capture sanitize capture.har --salt my-key
137160

138161
# Sanitize with static placeholders
139-
har-capture sanitize device.har --no-salt
162+
har-capture sanitize capture.har --no-salt
140163

141164
# Use custom patterns
142-
har-capture sanitize device.har --patterns custom.json
165+
har-capture sanitize capture.har --patterns custom.json
143166

144167
# Validate for PII leaks
145-
har-capture validate device.har
168+
har-capture validate capture.har
146169
```
147170

148171
## Correlation-Preserving Redaction
@@ -195,8 +218,8 @@ src/har_capture/patterns/
195218

196219
**Add custom patterns via CLI:**
197220
```bash
198-
har-capture sanitize device.har --patterns my_patterns.json
199-
har-capture validate device.har --patterns my_patterns.json
221+
har-capture sanitize capture.har --patterns my_patterns.json
222+
har-capture validate capture.har --patterns my_patterns.json
200223
```
201224

202225
**Add custom patterns via Python:**
@@ -234,107 +257,41 @@ The sanitization removes the following types of PII:
234257
- **WiFi Credentials**: In JavaScript variables
235258
- **Device Names**: In network device lists
236259

237-
## Modules
238-
239-
### sanitization
240-
241-
Core PII removal with zero external dependencies.
242-
243-
```python
244-
from har_capture.sanitization import (
245-
sanitize_html, # Remove PII from HTML
246-
sanitize_har, # Remove PII from HAR data
247-
sanitize_har_file, # Sanitize HAR file on disk
248-
check_for_pii, # Detect potential PII
249-
)
250-
251-
# All support salt and custom_patterns options
252-
clean = sanitize_html(html, salt="auto", custom_patterns=None)
253-
```
254-
255-
### patterns
256-
257-
Pattern loading and hashing utilities.
258-
259-
```python
260-
from har_capture.patterns import (
261-
Hasher, # Salted hash generator
262-
load_pii_patterns, # Load PII regex patterns
263-
load_sensitive_patterns, # Load sensitive field names
264-
load_allowlist, # Load safe placeholders
265-
)
266-
267-
# Create a hasher for manual use
268-
hasher = Hasher.create(salt="my-key")
269-
hashed_mac = hasher.hash_mac("AA:BB:CC:DD:EE:FF") # "02:a1:b2:c3:d4:e5"
270-
```
271-
272-
### capture
273-
274-
Browser-based HAR capture using Playwright.
275-
276-
```python
277-
from har_capture.capture import capture_device_har
278-
279-
result = capture_device_har(
280-
ip="router.local", # or IP address like "10.0.0.1"
281-
output="device.har",
282-
sanitize=True,
283-
compress=True,
284-
)
285-
print(result.har_path)
286-
print(result.sanitized_path)
287-
```
288-
289-
### validation
290-
291-
Check HAR files for PII leaks.
292-
293-
```python
294-
from har_capture.validation import validate_har, Finding
295-
296-
findings = validate_har("device.har", custom_patterns="my_patterns.json")
297-
for finding in findings:
298-
print(f"{finding.severity}: {finding.reason}")
299-
print(f" Location: {finding.location}")
300-
print(f" Value: {finding.value}")
301-
```
302-
303260
## CLI Commands
304261

305-
### capture
262+
### get
306263

307-
Capture device traffic using a browser.
264+
Capture HTTP traffic using a browser.
308265

309266
```bash
310-
har-capture capture <DEVICE_IP>
311-
har-capture capture <DEVICE_IP> --output device.har
312-
har-capture capture <DEVICE_IP> --no-sanitize
267+
har-capture get <TARGET>
268+
har-capture get <TARGET> --output capture.har
269+
har-capture get <TARGET> --no-sanitize
313270
```
314271

315272
### sanitize
316273

317274
Remove PII from HAR files.
318275

319276
```bash
320-
har-capture sanitize device.har
321-
har-capture sanitize device.har --output clean.har --compress
322-
har-capture sanitize device.har --salt my-key # Consistent hash
323-
har-capture sanitize device.har --no-salt # Static placeholders
324-
har-capture sanitize device.har --patterns custom.json
325-
har-capture sanitize device.har --max-size 500 # Allow up to 500MB
326-
har-capture sanitize device.har --compression-level 6 # Faster compression
277+
har-capture sanitize capture.har
278+
har-capture sanitize capture.har --output clean.har --compress
279+
har-capture sanitize capture.har --salt my-key # Consistent hash
280+
har-capture sanitize capture.har --no-salt # Static placeholders
281+
har-capture sanitize capture.har --patterns custom.json
282+
har-capture sanitize capture.har --max-size 500 # Allow up to 500MB
283+
har-capture sanitize capture.har --compression-level 6 # Faster compression
327284
```
328285

329286
### validate
330287

331288
Check for PII leaks.
332289

333290
```bash
334-
har-capture validate device.har
291+
har-capture validate capture.har
335292
har-capture validate --dir ./captures --recursive
336-
har-capture validate device.har --strict
337-
har-capture validate device.har --patterns custom.json
293+
har-capture validate capture.har --strict
294+
har-capture validate capture.har --patterns custom.json
338295
```
339296

340297
## Platform Support

SECURITY.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,5 @@
11
# Security Policy
22

3-
<!--
4-
MAINTAINER TODO: Enable private vulnerability reporting
5-
Settings → Code security and analysis → Private vulnerability reporting → Enable
6-
Delete this comment once enabled.
7-
-->
8-
93
## Reporting a Vulnerability
104

115
If you discover a security vulnerability in har-capture, please report it privately using GitHub's security advisory feature:

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "har-capture"
7-
version = "0.1.2"
7+
version = "0.2.0"
88
description = "HAR capture and PII sanitization library for network traffic analysis"
99
readme = "README.md"
1010
license = "MIT"

src/har_capture/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424

2525
from __future__ import annotations
2626

27-
__version__ = "0.1.2"
27+
__version__ = "0.2.0"
2828

2929
# Re-export public API for convenience
3030
from har_capture.sanitization import (

src/har_capture/capture/browser.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,15 +58,15 @@ class CaptureResult:
5858
"""Result of a HAR capture operation.
5959
6060
Attributes:
61-
har_path: Path to the captured HAR file
61+
har_path: Path to the raw HAR file (None if deleted after sanitization)
6262
compressed_path: Path to compressed .har.gz file if created
6363
sanitized_path: Path to sanitized HAR file if created
6464
stats: Dict with capture statistics (entry counts, sizes)
6565
success: True if capture succeeded
6666
error: Error message if capture failed
6767
"""
6868

69-
har_path: Path
69+
har_path: Path | None = None
7070
compressed_path: Path | None = None
7171
sanitized_path: Path | None = None
7272
stats: dict[str, Any] | None = None
@@ -200,23 +200,25 @@ def capture_device_har(
200200
http_credentials: dict[str, str] | None = None,
201201
sanitize: bool = True,
202202
compress: bool = True,
203+
keep_raw: bool = False,
203204
include_fonts: bool = False,
204205
include_images: bool = False,
205206
include_media: bool = False,
206207
) -> CaptureResult:
207-
"""Capture device traffic using Playwright browser.
208+
"""Capture HTTP traffic using Playwright browser.
208209
209210
This function launches a browser window and records all network traffic
210-
while the user interacts with their device. The user logs in manually -
211+
while the user interacts with the target. The user logs in manually -
211212
the browser handles authentication regardless of the method used.
212213
213214
Args:
214-
ip: Device IP address or hostname (e.g., "router.local", "10.0.0.1")
215+
ip: Target URL, hostname, or IP address (e.g., "example.com", "10.0.0.1")
215216
output: Output HAR filename (default: capture_<timestamp>.har)
216217
browser: Browser to use ("chromium", "firefox", "webkit")
217218
http_credentials: Optional dict with "username" and "password" for HTTP Basic Auth
218219
sanitize: Whether to sanitize the HAR after capture
219220
compress: Whether to compress the HAR after capture
221+
keep_raw: If True, keep the raw (unsanitized) HAR file
220222
include_fonts: If True, don't filter font files (.woff, .ttf, etc.)
221223
include_images: If True, don't filter image files (.png, .jpg, etc.)
222224
include_media: If True, don't filter media files (.mp3, .mp4, etc.)
@@ -369,6 +371,14 @@ def _is_missing_deps_error(error_msg: str) -> bool:
369371

370372
sanitized_path = sanitize_har_file(str(output_path))
371373
result.sanitized_path = Path(sanitized_path)
374+
375+
# Delete raw file unless keep_raw is set
376+
if not keep_raw and result.sanitized_path and result.sanitized_path.exists():
377+
try:
378+
output_path.unlink()
379+
result.har_path = None # type: ignore[assignment]
380+
except Exception as e:
381+
_LOGGER.warning("Failed to delete raw HAR: %s", e)
372382
except Exception as e:
373383
_LOGGER.warning("Sanitization failed: %s", e)
374384

0 commit comments

Comments
 (0)