chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup by kwschulz · Pull Request #45 · solentlabs/har-capture

kwschulz · 2026-05-04T13:12:07Z

Summary

Breaking: --patterns is now required on get, sanitize, and validate. Closes the structural cause behind contributors silently shipping device PII to cable_modem_monitor issue threads when they didn't know to load the network-device domain.

Seven commits, end-to-end test coverage for every issue addressed (TDD red→green where applicable, regression guards where the engine already caught it):

Commit	Scope
`fd3c81b` docs	Confidence-boundary contract surfaced as CLAUDE.md Architecture principle #7 (sourced from existing spec language); release flow extracted to `docs/RELEASE.md`
`e545cca` feat(patterns)	Heuristic detectors for default-device PII — closes #49 (Netgear serial), partial #47 (SSID + default password)
`5f7edb1` test	Move inline HAR blobs from CLI tests to per-module fixtures per CLAUDE.md #14
`0887118` chore(release)	Bump version 0.8.3 → 0.9.0
`9bf112c` feat(cli)!	Mandatory `--patterns` (BREAKING) + WPS-PIN labeled-regex (completes #47) + load-time warn on JSON `\b` trap (#51)
`0c11040` fix(capture)	Detect Playwright browser by install dir, not Linux-only binary path (#50)
`f437709` chore	Markdownlint cleanup (MD024/029/033/036/040 → 0); add #47 SN regression fixture + #50 Windows-layout regression test

Issues addressed

SN, WPS, Default Password, Default SSID Sanitization #47 SN/WPS/default-pass/default-SSID: heuristic detectors for SSID + password + serial backstop; html.py Pass 2d for WPS PIN; 9 cited fixtures in test_pii_regressions.py.
Auto-redaction misses Netgear modem serial format ([0-9][A-Z]{2}[0-9]{4}[A-Z0-9]{6}) #49 Netgear serial: serial_number heuristic detector with exact [0-9][A-Z]{2}[0-9]{4}[A-Z0-9]{6} regex from the issue; cited fixture row.
har-capture get re-prompts to install Chromium on every invocation despite Playwright Chromium being present #50 Chromium re-prompt: platform-agnostic install-directory check; cited Windows-layout regression test that would have failed against the old Linux-only path lookup.
--patterns JSON loader silently no-ops on \b regex escapes (JSON parses to backspace, not regex word-boundary) #51 JSON \b silent no-op: load-time warning identifies the offending key path and the \\b correction; cited test reproduces the exact failure mode.
[Bug]: Interactive Dialog Handling for Modem Captures #46 Interactive dialogs: deferred. PR feat(capture): surface browser dialogs for user-driven capture #48 (external contributor) left open for review post-0.9.0.

BREAKING CHANGE

--patterns required on every sanitization-running subcommand. base is a reserved sentinel for core-universal-PII-only. Missing --patterns prints a domain listing to stderr and exits 2. validate's --patterns shape changed from single Path to repeatable list to match get and sanitize. Library API (sanitize_har_file(), sanitize_har(), validate_har()) unchanged.

Migration

# Before (0.8.x)
har-capture get https://router.local
har-capture sanitize device.har
har-capture validate device.har

# After (0.9.0)
har-capture get https://router.local --patterns network-device
har-capture sanitize device.har --patterns network-device
har-capture validate device.har --patterns network-device

# Or for non-device (web/API) captures:
har-capture sanitize webapp.har --patterns base

Run har-capture patterns for the full list of choices.

Test plan

Full suite passes: 1997 passed, 18 deselected
Ruff: all checks passed
Mypy: no issues found
Pre-commit (all hooks) passes on every commit
Five issues (SN, WPS, Default Password, Default SSID Sanitization #47, Auto-redaction misses Netgear modem serial format ([0-9][A-Z]{2}[0-9]{4}[A-Z0-9]{6}) #49, har-capture get re-prompts to install Chromium on every invocation despite Playwright Chromium being present #50, --patterns JSON loader silently no-ops on \b regex escapes (JSON parses to backspace, not regex word-boundary) #51) have directly-cited regression fixtures/tests in the suite
Markdownlint MD024 / MD029 / MD033 / MD036 / MD040 all at 0 violations

🤖 Generated with Claude Code

Three changes close session-debt from v0.8.1 / v0.8.2. CI install single source of truth --------------------------------- scripts/install-ci-deps.sh defines the install line once. ci-local.sh and both ci.yml jobs invoke it, removing the duplication that caused v0.8.1's push regression where ci-local.sh's install profile drifted from the matrix. release.py polls for in-flight CI --------------------------------- check_ci_passed_on_head polls every 20s for up to 10min with progress feedback instead of failing on the first read. Release-discipline audit (A + B + E) ------------------------------------ Three reinforcing checks for the AI knowing-not-applying flaw observed across v0.8.1 -> v0.8.3 (three releases for what should have been one because Claude made decisions at "should I push?" that violated rules just written down): A. scan_for_anti_patterns greps recent git log for anti-pattern signatures. BLOCKER findings abort unless --acknowledged "<reason>" is supplied. B. print_audit_checklist prints a diff-grounded checklist on every invocation. Visibility-to-Ken is the actual gate; questions are rubber-stampable in isolation but harder to ignore when bundled with commit/file context. E. require_signoff blocks tag-push on the developer typing "RELEASE OK X.Y.Z" exactly. No --yes flag — the bypass would defeat the purpose. The unfakeable component. Pure helpers (scan_log_for_anti_patterns, check_signoff_phrase, expected_signoff_phrase) extracted for unit-test isolation; 28 tests in tests/test_scripts/test_release_audit.py cover regex behaviour and exact-match logic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-04T13:17:48Z

Codecov Report

❌ Patch coverage is 95.28302% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/har_capture/capture/browser.py	78.26%	2 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

CLAUDE.md gains Architecture principle #7 (confidence boundary between deterministic and heuristic redaction layers), sourced from existing spec language. The principle was previously stated in SANITIZATION_SPEC invariant #11 and ARCHITECTURE's "Confidence boundary" paragraph but not surfaced at the entry-point file — a contributor reading CLAUDE.md as the primary briefing would miss the load-bearing contract that governs which layer redacts. Release Flow section moved to new docs/RELEASE.md so CLAUDE.md (218 -> 162 lines) is dominated by principles rather than reference content. Principles go from ~52% to ~72% of the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ial #47) Three changes to network_device.json: - New serial_number detector with Netgear C7000v2 format ([0-9][A-Z]{2}[0-9]{4}[A-Z0-9]{6}, surfaced by #49) plus a broader uppercase-alphanumeric backstop for future vendor variants. - wifi_ssid detector extended with a default-SSID prefix whitelist (SPSETUP, MOTO, ATTwifi, XFINITY, HOMEHUB). - Detector order changed so keyword-based device_name runs before shape-based wifi_ssid, preventing NETGEAR-C7000 from being miscategorized. New regression test file tests/test_sanitization/test_pii_regressions.py + fixture keys cases on issue numbers so future user reports add a fixture row rather than a new test file. Closes #49. Partial #47 — SN, default SSID, and default password all flag correctly through heuristics + UI now. WPS PIN coverage is tracked separately as a regex-layer concern: pure-digit values hit the universal safe pattern by design, and disambiguating a WPS PIN from a packet counter requires the adjacent "WPS PIN" / "PIN Code" label, which is the regex layer's job per CLAUDE.md principle #7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The work originally queued as v0.8.3 (CI install SSOT, release.py polling, release-discipline audit gates) folds into v0.9.0 along with the doc surfacing and heuristic detector additions on this branch. Per the branch refocus, the CHANGELOG section retitles 0.8.3 -> 0.9.0 rather than adding a parallel 0.9.0 section above. Version bumped in pyproject.toml and src/har_capture/__init__.py. Comparison link updated to compare against v0.8.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CLAUDE.md rule 14: large test data lives in tests/fixtures/*.json. Inline `har_data = {...}` blobs were scattered across CLI test files, some 30+ lines each. Moved out of test files into per-module fixtures: - tests/test_cli/test_sanitize.py: valid_har, large_har (structural template; 500 KB padding stays in Python because the size is the behavioural point), already_redacted_har (template; 15-placeholder string stays in Python), har_with_flagged_fields. - tests/test_cli/test_validate.py: clean_har, har_with_secrets, har_with_warnings, directory_clean_har, directory_dirty_har, and the custom_secret pattern previously inlined in test_validate_with_custom_patterns. - tests/test_cli/test_patterns.py: the four --show test pattern files (show_external_full_domain, show_minimal_only_description, show_no_description, show_safe_pattern_without_comment). - tests/test_cli/test_interactive.py: the sanitized_har_file fixture used by the three apply_reviewed_redactions tests. - tests/test_validation/test_secrets.py: validate_har_gzipped's HAR (appended to the existing tests/fixtures/test_secrets.json). Kept inline per CLAUDE.md rule 14's behavioural-tests carve-out: - Intentionally-malformed strings used to exercise error paths. - One-line behavioural dicts in test_apply_redactions.py / test_appears_sanitized.py / test_salt_preservation.py where the data IS the test scenario (specific patterns, specific structures feeding specific assertions). - The parametrized URL test and dynamic-base64 test in test_secrets.py where the dynamic content is the behavioural point. Full suite: 1983 passed, 18 deselected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…JSON \b trap Three v0.9.0 changes, all driven by the cable_modem_monitor privacy promise: 1. Mandatory --patterns (BREAKING) `get`, `sanitize`, `validate` now require --patterns. `base` is a reserved sentinel for universal-PII-only; named domains (`network-device`) or custom JSON paths are the alternatives. Missing --patterns prints a domain listing to stderr and exits 2. `validate`'s --patterns shape changed from single Path to repeatable list. `validation/secrets.py` widened to accept str|dict|None so multi- pattern merges work end-to-end. Closes the structural cause behind the #47 / #49 leaks: contributors running bare `har-capture get` without loading network-device silently shipped device PII to CMM issue threads. 2. WPS-PIN labeled-regex coverage (completes #47) pii.json gains a `wps_pin` pattern; html.py Pass 2d redacts 8-digit values whose label is `WPS PIN`, `PIN Code`, `Pairing PIN`, or `Default PIN`. Pure-digit values can't be flagged heuristically; the adjacent label is what makes 100%-confidence deterministic redaction achievable per CLAUDE.md principle #7. 3. JSON-escape-trap warning at pattern load (#51) `_load_custom_patterns` now scans regex strings for ASCII backspace and form-feed, logging a warning that identifies the offending key path and the corrected JSON escape. Doesn't reject the pattern - just makes the silent-no-op case loud. Mechanical: - tests/test_cli/* invocations gained `--patterns base` - test_patterns_resolver.py + fixture covers the new CLI helper - README/CLI_REFERENCE/USE_CASES/CUSTOM_PATTERNS examples updated - CHANGELOG: BREAKING entry + three Added bullets under [0.9.0] Full suite: 1993 passed, 18 deselected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…y binary path (#50) `check_browser_installed` previously resolved Chromium's binary at a hardcoded Linux-only relative path (`chrome-linux64/chrome`), which never matched on Windows or macOS. The function returned False without consulting the dry-run fallback - re-prompting users to "install" a browser already on disk. Refactor to the platform-agnostic install marker: resolve and check the `<browser>-<revision>/` directory itself. Per-platform binary-layout drift between Playwright versions can no longer break detection. - Removed `_BROWSER_EXECUTABLES` per-platform mapping - Renamed `_get_browser_executable` -> `_get_browser_install_dir` - `check_browser_installed`: `is_dir()` + `any(iterdir())`, dry-run fallback unchanged - Tests updated; added empty-dir-falls-through-to-dry-run coverage Full suite: 1995 passed, 18 deselected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…regression tests Markdown lint cleanup across 17 .md files (no shortcuts - fixed the content, did not silence via .markdownlint.json): - MD024 (67 -> 0): version-qualified ### subsections in CHANGELOG (`### Added in 0.9.0`, etc.); command-qualified in CLI_REFERENCE (`### Examples (get)`, etc.); two duplicate `Problem` headings in CAPTURE_SPEC scoped. - MD029 (4 -> 0): CLAUDE.md principles rewritten as bullet list with bold-prefixed numbers - global 1-19 numbering preserved, per-section visual grouping preserved, `principle #N` references still work. - MD033 (9 -> 0): README.md three <details>/<summary> blocks converted to ### sections (Quick Start: Windows / macOS-Linux / Existing HAR). - MD036 (8 -> 0): USE_CASES.md **Capture**/**Sanitization**/etc. promoted to ### headings; three single-line **Note**/**Example** paragraphs promoted or de-emphasized to plain prose. - MD040 (43 -> 0): every bare opening fence content-classified and tagged (bash/python/json/text) by a state-machine pass that preserves open/close pairing. Plus two explicit regression tests surfaced during pre-release verification: - #47 SN portion: added `generic_uppercase_alnum_serial` (`7TH4582JK9QP`) to tests/fixtures/test_pii_regressions.json so the serial_number heuristic backstop has a directly-cited fixture row. - #50 Windows-layout: new test_deps.py case constructs an install dir containing only chrome-win64/chrome.exe (no Linux binary) and asserts check_browser_installed returns True. Would have failed against the pre-0c11040 Linux-only path lookup. Tests + ruff: 1997 passed, all ruff checks passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ting (#46) Related to #46 - stop relying on Playwright's default auto-dismiss behavior for interactive headed runs - watch for browser dialogs, surface them to the user, and record the resolved outcome in _solentlabs - add opened_at timestamps so repeated dialogs are captured as distinct events for HAR analysis - add test coverage for dialog capture and run the full test suite to check for regressions - document dialog behavior and backfill the missing popup coverage in the capture docs

…nups Three follow-ups on top of ccpk1's dialog work (b12881a) to make it defensible per CLAUDE.md principles: 1. Polling loop -> page.expose_function (principle #10: no shortcuts). The original implementation maintained a window-scoped outcome queue and polled it from a Python dialog handler with `while True: ...; time.sleep(0.1)`. That's a workaround for not using Playwright's first-class JS->Python bridge. Replaced with a two-event model: `page.on("dialog")` creates the open record; the exposed `__harCaptureDialogResolved` binding (called by the JS init script after the user clicks) updates it with the action. No polling, no deadlock surface. Match-by-(type, message) so nested or concurrent dialogs can't mis-correlate. 2. sys.stderr.write -> _LOGGER.info (principle #11: quality gates). Both call sites converted to match the module's 26 existing _LOGGER calls. `sys` import removed. 3. Revert ~250 lines of unrelated fixture reformatting. The original PR reformatted multiple unrelated test_browser.json sections from compact one-line JSON to multi-line. Restored the project's compact convention; kept only the 7 substantive has_dialogs field additions + the new with_dialogs case. Cumulative diff vs. main now 311+/18- (was 544+/77-). Full suite: 2002 passed, 18 deselected. Co-Authored-By: ccpk1 <64691424+ccpk1@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Promote the dialog support entry into the 0.9.0 ### Added section now that PR #52 ships in 0.9.0 (not deferred to a later release). Expanded the entry to cite #46 and to note the page.expose_function architectural choice for future readers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(capture): surface browser dialogs for user-driven capture (closes #46)

chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup

kwschulz and others added 7 commits May 11, 2026 09:35

kwschulz changed the title ~~chore(release): v0.8.3 — release-discipline gates + CI tooling SSOT~~ chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup May 11, 2026

ccpk1 and others added 2 commits May 11, 2026 13:25

kwschulz mentioned this pull request May 11, 2026

feat(capture): surface browser dialogs for user-driven capture (closes #46) #52

Merged

5 tasks

kwschulz and others added 2 commits May 11, 2026 15:41

Merge pull request #52 from solentlabs/feat/issue-46-dialogs

891dfb4

feat(capture): surface browser dialogs for user-driven capture (closes #46)

kwschulz merged commit cf987c2 into main May 11, 2026
6 checks passed

kwschulz mentioned this pull request May 11, 2026

SN, WPS, Default Password, Default SSID Sanitization #47

Closed

kwschulz deleted the chore/release-0.8.3 branch May 11, 2026 20:38

ccpk1 pushed a commit to ccpk1/har-capture that referenced this pull request May 16, 2026

Merge pull request solentlabs#45 from solentlabs/chore/release-0.8.3

053c7dc

chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup#45

chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup#45
kwschulz merged 12 commits into
mainfrom
chore/release-0.8.3

kwschulz commented May 4, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kwschulz commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues addressed

BREAKING CHANGE

Migration

Test plan

Uh oh!

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kwschulz commented May 4, 2026 •

edited

Loading

codecov Bot commented May 4, 2026 •

edited

Loading