Skip to content

chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup#45

Merged
kwschulz merged 12 commits into
mainfrom
chore/release-0.8.3
May 11, 2026
Merged

chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup#45
kwschulz merged 12 commits into
mainfrom
chore/release-0.8.3

Conversation

@kwschulz
Copy link
Copy Markdown
Contributor

@kwschulz kwschulz commented May 4, 2026

Summary

Breaking: --patterns is now required on get, sanitize, and validate. Closes the structural cause behind contributors silently shipping device PII to cable_modem_monitor issue threads when they didn't know to load the network-device domain.

Seven commits, end-to-end test coverage for every issue addressed (TDD red→green where applicable, regression guards where the engine already caught it):

Commit Scope
fd3c81b docs Confidence-boundary contract surfaced as CLAUDE.md Architecture principle #7 (sourced from existing spec language); release flow extracted to docs/RELEASE.md
e545cca feat(patterns) Heuristic detectors for default-device PII — closes #49 (Netgear serial), partial #47 (SSID + default password)
5f7edb1 test Move inline HAR blobs from CLI tests to per-module fixtures per CLAUDE.md #14
0887118 chore(release) Bump version 0.8.3 → 0.9.0
9bf112c feat(cli)! Mandatory --patterns (BREAKING) + WPS-PIN labeled-regex (completes #47) + load-time warn on JSON \b trap (#51)
0c11040 fix(capture) Detect Playwright browser by install dir, not Linux-only binary path (#50)
f437709 chore Markdownlint cleanup (MD024/029/033/036/040 → 0); add #47 SN regression fixture + #50 Windows-layout regression test

Issues addressed

BREAKING CHANGE

--patterns required on every sanitization-running subcommand. base is a reserved sentinel for core-universal-PII-only. Missing --patterns prints a domain listing to stderr and exits 2. validate's --patterns shape changed from single Path to repeatable list to match get and sanitize. Library API (sanitize_har_file(), sanitize_har(), validate_har()) unchanged.

Migration

# Before (0.8.x)
har-capture get https://router.local
har-capture sanitize device.har
har-capture validate device.har

# After (0.9.0)
har-capture get https://router.local --patterns network-device
har-capture sanitize device.har --patterns network-device
har-capture validate device.har --patterns network-device

# Or for non-device (web/API) captures:
har-capture sanitize webapp.har --patterns base

Run har-capture patterns for the full list of choices.

Test plan

🤖 Generated with Claude Code

Three changes close session-debt from v0.8.1 / v0.8.2.

CI install single source of truth
---------------------------------
scripts/install-ci-deps.sh defines the install line once. ci-local.sh
and both ci.yml jobs invoke it, removing the duplication that caused
v0.8.1's push regression where ci-local.sh's install profile drifted
from the matrix.

release.py polls for in-flight CI
---------------------------------
check_ci_passed_on_head polls every 20s for up to 10min with progress
feedback instead of failing on the first read.

Release-discipline audit (A + B + E)
------------------------------------
Three reinforcing checks for the AI knowing-not-applying flaw observed
across v0.8.1 -> v0.8.3 (three releases for what should have been one
because Claude made decisions at "should I push?" that violated rules
just written down):

A. scan_for_anti_patterns greps recent git log for anti-pattern
   signatures. BLOCKER findings abort unless --acknowledged "<reason>"
   is supplied.

B. print_audit_checklist prints a diff-grounded checklist on every
   invocation. Visibility-to-Ken is the actual gate; questions are
   rubber-stampable in isolation but harder to ignore when bundled
   with commit/file context.

E. require_signoff blocks tag-push on the developer typing
   "RELEASE OK X.Y.Z" exactly. No --yes flag — the bypass would
   defeat the purpose. The unfakeable component.

Pure helpers (scan_log_for_anti_patterns, check_signoff_phrase,
expected_signoff_phrase) extracted for unit-test isolation; 28 tests
in tests/test_scripts/test_release_audit.py cover regex behaviour
and exact-match logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 95.28302% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/har_capture/capture/browser.py 78.26% 2 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

kwschulz and others added 7 commits May 11, 2026 09:35
CLAUDE.md gains Architecture principle #7 (confidence boundary between
deterministic and heuristic redaction layers), sourced from existing
spec language. The principle was previously stated in SANITIZATION_SPEC
invariant #11 and ARCHITECTURE's "Confidence boundary" paragraph but not
surfaced at the entry-point file — a contributor reading CLAUDE.md as
the primary briefing would miss the load-bearing contract that governs
which layer redacts.

Release Flow section moved to new docs/RELEASE.md so CLAUDE.md
(218 -> 162 lines) is dominated by principles rather than reference
content. Principles go from ~52% to ~72% of the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ial #47)

Three changes to network_device.json:

- New serial_number detector with Netgear C7000v2 format
  ([0-9][A-Z]{2}[0-9]{4}[A-Z0-9]{6}, surfaced by #49) plus a broader
  uppercase-alphanumeric backstop for future vendor variants.
- wifi_ssid detector extended with a default-SSID prefix whitelist
  (SPSETUP, MOTO, ATTwifi, XFINITY, HOMEHUB).
- Detector order changed so keyword-based device_name runs before
  shape-based wifi_ssid, preventing NETGEAR-C7000 from being
  miscategorized.

New regression test file tests/test_sanitization/test_pii_regressions.py
+ fixture keys cases on issue numbers so future user reports add a
fixture row rather than a new test file.

Closes #49. Partial #47 — SN, default SSID, and default password all
flag correctly through heuristics + UI now. WPS PIN coverage is
tracked separately as a regex-layer concern: pure-digit values hit
the universal safe pattern by design, and disambiguating a WPS PIN
from a packet counter requires the adjacent "WPS PIN" / "PIN Code"
label, which is the regex layer's job per CLAUDE.md principle #7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The work originally queued as v0.8.3 (CI install SSOT, release.py
polling, release-discipline audit gates) folds into v0.9.0 along
with the doc surfacing and heuristic detector additions on this
branch. Per the branch refocus, the CHANGELOG section retitles
0.8.3 -> 0.9.0 rather than adding a parallel 0.9.0 section above.

Version bumped in pyproject.toml and src/har_capture/__init__.py.
Comparison link updated to compare against v0.8.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md rule 14: large test data lives in tests/fixtures/*.json.
Inline `har_data = {...}` blobs were scattered across CLI test files,
some 30+ lines each.

Moved out of test files into per-module fixtures:

- tests/test_cli/test_sanitize.py: valid_har, large_har (structural
  template; 500 KB padding stays in Python because the size is the
  behavioural point), already_redacted_har (template; 15-placeholder
  string stays in Python), har_with_flagged_fields.
- tests/test_cli/test_validate.py: clean_har, har_with_secrets,
  har_with_warnings, directory_clean_har, directory_dirty_har, and
  the custom_secret pattern previously inlined in
  test_validate_with_custom_patterns.
- tests/test_cli/test_patterns.py: the four --show test pattern
  files (show_external_full_domain, show_minimal_only_description,
  show_no_description, show_safe_pattern_without_comment).
- tests/test_cli/test_interactive.py: the sanitized_har_file fixture
  used by the three apply_reviewed_redactions tests.
- tests/test_validation/test_secrets.py: validate_har_gzipped's HAR
  (appended to the existing tests/fixtures/test_secrets.json).

Kept inline per CLAUDE.md rule 14's behavioural-tests carve-out:

- Intentionally-malformed strings used to exercise error paths.
- One-line behavioural dicts in test_apply_redactions.py /
  test_appears_sanitized.py / test_salt_preservation.py where the
  data IS the test scenario (specific patterns, specific structures
  feeding specific assertions).
- The parametrized URL test and dynamic-base64 test in test_secrets.py
  where the dynamic content is the behavioural point.

Full suite: 1983 passed, 18 deselected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…JSON \b trap

Three v0.9.0 changes, all driven by the cable_modem_monitor privacy
promise:

1. Mandatory --patterns (BREAKING)
   `get`, `sanitize`, `validate` now require --patterns. `base` is a
   reserved sentinel for universal-PII-only; named domains
   (`network-device`) or custom JSON paths are the alternatives. Missing
   --patterns prints a domain listing to stderr and exits 2. `validate`'s
   --patterns shape changed from single Path to repeatable list.
   `validation/secrets.py` widened to accept str|dict|None so multi-
   pattern merges work end-to-end. Closes the structural cause behind
   the #47 / #49 leaks: contributors running bare `har-capture get`
   without loading network-device silently shipped device PII to CMM
   issue threads.

2. WPS-PIN labeled-regex coverage (completes #47)
   pii.json gains a `wps_pin` pattern; html.py Pass 2d redacts 8-digit
   values whose label is `WPS PIN`, `PIN Code`, `Pairing PIN`, or
   `Default PIN`. Pure-digit values can't be flagged heuristically; the
   adjacent label is what makes 100%-confidence deterministic redaction
   achievable per CLAUDE.md principle #7.

3. JSON-escape-trap warning at pattern load (#51)
   `_load_custom_patterns` now scans regex strings for ASCII backspace
   and form-feed, logging a warning that identifies the offending key
   path and the corrected JSON escape. Doesn't reject the pattern - just
   makes the silent-no-op case loud.

Mechanical:
- tests/test_cli/* invocations gained `--patterns base`
- test_patterns_resolver.py + fixture covers the new CLI helper
- README/CLI_REFERENCE/USE_CASES/CUSTOM_PATTERNS examples updated
- CHANGELOG: BREAKING entry + three Added bullets under [0.9.0]

Full suite: 1993 passed, 18 deselected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…y binary path (#50)

`check_browser_installed` previously resolved Chromium's binary at a
hardcoded Linux-only relative path (`chrome-linux64/chrome`), which
never matched on Windows or macOS. The function returned False without
consulting the dry-run fallback - re-prompting users to "install" a
browser already on disk.

Refactor to the platform-agnostic install marker: resolve and check the
`<browser>-<revision>/` directory itself. Per-platform binary-layout
drift between Playwright versions can no longer break detection.

- Removed `_BROWSER_EXECUTABLES` per-platform mapping
- Renamed `_get_browser_executable` -> `_get_browser_install_dir`
- `check_browser_installed`: `is_dir()` + `any(iterdir())`, dry-run
  fallback unchanged
- Tests updated; added empty-dir-falls-through-to-dry-run coverage

Full suite: 1995 passed, 18 deselected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…regression tests

Markdown lint cleanup across 17 .md files (no shortcuts - fixed the
content, did not silence via .markdownlint.json):

- MD024 (67 -> 0): version-qualified ### subsections in CHANGELOG
  (`### Added in 0.9.0`, etc.); command-qualified in CLI_REFERENCE
  (`### Examples (get)`, etc.); two duplicate `Problem` headings in
  CAPTURE_SPEC scoped.
- MD029 (4 -> 0): CLAUDE.md principles rewritten as bullet list with
  bold-prefixed numbers - global 1-19 numbering preserved, per-section
  visual grouping preserved, `principle #N` references still work.
- MD033 (9 -> 0): README.md three <details>/<summary> blocks converted
  to ### sections (Quick Start: Windows / macOS-Linux / Existing HAR).
- MD036 (8 -> 0): USE_CASES.md **Capture**/**Sanitization**/etc.
  promoted to ### headings; three single-line **Note**/**Example**
  paragraphs promoted or de-emphasized to plain prose.
- MD040 (43 -> 0): every bare opening fence content-classified and
  tagged (bash/python/json/text) by a state-machine pass that preserves
  open/close pairing.

Plus two explicit regression tests surfaced during pre-release
verification:

- #47 SN portion: added `generic_uppercase_alnum_serial`
  (`7TH4582JK9QP`) to tests/fixtures/test_pii_regressions.json so the
  serial_number heuristic backstop has a directly-cited fixture row.

- #50 Windows-layout: new test_deps.py case constructs an install dir
  containing only chrome-win64/chrome.exe (no Linux binary) and asserts
  check_browser_installed returns True. Would have failed against the
  pre-0c11040 Linux-only path lookup.

Tests + ruff: 1997 passed, all ruff checks passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kwschulz kwschulz changed the title chore(release): v0.8.3 — release-discipline gates + CI tooling SSOT chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup May 11, 2026
ccpk1 and others added 2 commits May 11, 2026 13:25
…ting (#46)

Related to #46

- stop relying on Playwright's default auto-dismiss behavior for interactive headed runs
- watch for browser dialogs, surface them to the user, and record the resolved outcome in _solentlabs
- add opened_at timestamps so repeated dialogs are captured as distinct events for HAR analysis
- add test coverage for dialog capture and run the full test suite to check for regressions
- document dialog behavior and backfill the missing popup coverage in the capture docs
…nups

Three follow-ups on top of ccpk1's dialog work (b12881a) to make it
defensible per CLAUDE.md principles:

1. Polling loop -> page.expose_function (principle #10: no shortcuts).
   The original implementation maintained a window-scoped outcome queue
   and polled it from a Python dialog handler with `while True: ...;
   time.sleep(0.1)`. That's a workaround for not using Playwright's
   first-class JS->Python bridge. Replaced with a two-event model:
   `page.on("dialog")` creates the open record; the exposed
   `__harCaptureDialogResolved` binding (called by the JS init script
   after the user clicks) updates it with the action. No polling, no
   deadlock surface. Match-by-(type, message) so nested or concurrent
   dialogs can't mis-correlate.

2. sys.stderr.write -> _LOGGER.info (principle #11: quality gates).
   Both call sites converted to match the module's 26 existing _LOGGER
   calls. `sys` import removed.

3. Revert ~250 lines of unrelated fixture reformatting.
   The original PR reformatted multiple unrelated test_browser.json
   sections from compact one-line JSON to multi-line. Restored the
   project's compact convention; kept only the 7 substantive
   has_dialogs field additions + the new with_dialogs case.

Cumulative diff vs. main now 311+/18- (was 544+/77-).
Full suite: 2002 passed, 18 deselected.

Co-Authored-By: ccpk1 <64691424+ccpk1@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kwschulz and others added 2 commits May 11, 2026 15:41
Promote the dialog support entry into the 0.9.0 ### Added section now
that PR #52 ships in 0.9.0 (not deferred to a later release).
Expanded the entry to cite #46 and to note the page.expose_function
architectural choice for future readers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(capture): surface browser dialogs for user-driven capture (closes #46)
@kwschulz kwschulz merged commit cf987c2 into main May 11, 2026
6 checks passed
@kwschulz kwschulz deleted the chore/release-0.8.3 branch May 11, 2026 20:38
ccpk1 pushed a commit to ccpk1/har-capture that referenced this pull request May 16, 2026
chore(release): v0.9.0 — mandatory --patterns + heuristic detectors + Chromium fix + lint cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-redaction misses Netgear modem serial format ([0-9][A-Z]{2}[0-9]{4}[A-Z0-9]{6})

2 participants