Skip to content

feat(net): ErrSink framework — propagate the 4 silent gvproxy goroutine errors #612 didn't cover#634

Draft
G4614 wants to merge 4 commits into
boxlite-ai:mainfrom
G4614:fix/lifecycle-exit-non-zero-on-rest-errors
Draft

feat(net): ErrSink framework — propagate the 4 silent gvproxy goroutine errors #612 didn't cover#634
G4614 wants to merge 4 commits into
boxlite-ai:mainfrom
G4614:fix/lifecycle-exit-non-zero-on-rest-errors

Conversation

@G4614
Copy link
Copy Markdown
Contributor

@G4614 G4614 commented Jun 1, 2026

Extends #612 to cover the four sibling silent-failure goroutines it didn't touch. PR slot fix/lifecycle-exit-non-zero-on-rest-errors reused per request.

What's broken (before this PR)

#612 surfaced virtualnetwork.New bind errors via an ad-hoc initErr channel. Four sibling sites in the same goroutine were still logrus.Error + return — the Rust runtime never sees them:

line site symptom on failure
main.go:454 OverrideTCPHandler allow_net / MITM silently disabled — box healthy, traffic bypasses filter
main.go:475 transport.AcceptVfkit (macOS) VFKit transport never wires up; 20s later guest reports "DNS lookup … i/o timeout"
main.go:484 vn.AcceptVfkit VFKit protocol pump dead; every guest packet RSTs
main.go:496 listener.Accept (Linux) Qemu transport never wires up; same shape as 475
main.go:510 vn.AcceptQemu Qemu protocol pump dead; same shape as 484

OverrideTCPHandler is the security-shaped one — allow_net configured but filter not installed = box looks fine, traffic flows anywhere.

Two-sided verification — 5/5 sites toggled red

Every silent site is extracted into a testable helper, mocked with an injectable failure, and proven red when the sink.Runtime / sink.Init line is removed.

Layer 1: ErrSink framework contracts (9 unit tests, error_sink_test.go)

Toggled 7 contract lines (Init success/error, FIFO order, drop counter, producer-non-blocking, render with timestamp); each reds the relevant test. Full table at bottom.

Layer 2: Rust FFI integration (3 tests, libgvproxy-sys/src/lib.rs)

End-to-end through cgo: gvproxy_test_create_for_pollinggvproxy_test_inject_runtime_errorgvproxy_poll_runtime_errorgvproxy_free_stringgvproxy_destroy. Validates the actual call sequence the production Rust runtime will use.

Toggle: replace re := instance.errSink.PollRuntime() with re := (*RuntimeError)(nil) → 2/3 red.

Layer 3: Production wire sites — 5/5 silent sites, each with two-sided proof

Each silent goroutine is extracted into a small testable helper (runQemuAcceptLoop, runVfkitAcceptLoop, installTCPOverride), and tested by injecting a mock failure via function-value parameters.

site toggle (remove the sink line) red message
listener.Accept (Linux Qemu) qemu_accept_loop.go: remove sink.Runtime("listener.Accept", err) "expected listener.Accept runtime error in sink, got nil — pre-fix silent return: the goroutine logged the error and vanished without anyone (including the Rust runtime) ever observing it"
vn.AcceptQemu qemu_accept_loop.go: remove sink.Runtime("vn.AcceptQemu", err) "expected vn.AcceptQemu runtime error in sink, got nil — pre-fix silent return: protocol pump errored but no signal reached anyone (Rust runtime, operator, ops dashboard)"
transport.AcceptVfkit (macOS) vfkit_accept_loop.go: remove sink.Runtime("transport.AcceptVfkit", err) "expected transport.AcceptVfkit runtime error in sink, got nil — pre-fix silent return: gvproxy alive but VFKit transport dead, guest fails 20s later as DNS/network unreachable (same shape as #612)"
vn.AcceptVfkit vfkit_accept_loop.go: remove sink.Runtime("vn.AcceptVfkit", err) "expected vn.AcceptVfkit runtime error in sink, got nil — pre-fix silent return: VFKit protocol pump died, guest TCP connections RST silently"
OverrideTCPHandler install_tcp_override.go: swallow installFn() return "expected error to propagate; got nil — pre-fix silent install: OverrideTCPHandler error swallowed, allow_net/MITM silently disabled, box.create reports SUCCESS"

Plus 2 negative-control tests (*_CancelDoesNotSinkRuntime) on the Qemu and Vfkit loops — planned shutdown via ctx.Cancel must NOT pollute the runtime queue. Without these, every gvproxy_destroy would push self-inflicted "closed connection" noise.

Test plan

# Go side: framework + 5 wire sites
cd src/deps/libgvproxy-sys/gvproxy-bridge
go test ./... -timeout 120s -v

# Rust side: FFI integration
cargo test -p libgvproxy-sys --lib

19 Go tests + 3 Rust FFI tests = 22 total, all green; 12 toggle red'd (7 framework + 5 wire sites); all restored to green.

What's added (brief)

  • error_sink.go (~165) — ErrSink framework: Init (cap 1, blocking via WaitInit) + Runtime (cap 16, non-blocking producer; drop counter)
  • qemu_accept_loop.go (~80) — runQemuAcceptLoop(ctx, id, listener, acceptQemu func, sink), function-value injection for the protocol stage
  • vfkit_accept_loop.go (~80) — runVfkitAcceptLoop(ctx, id, conn, acceptVfkitTransport func, acceptVfkit func, sink)
  • install_tcp_override.go (~30) — installTCPOverride(hasFilter, installFn func) so the inline init goroutine routes through a single point
  • main.goGvproxyInstance.errSink field; init goroutine uses sink.Init/.Runtime; OverrideTCPHandler MOVED into init phase (failure now fail-fasts box.create instead of silently disabling allow_net); 3 inline goroutines replaced by extracted helper calls
  • gvproxy_poll_runtime_error (new //export) — Rust polls via this
  • gvproxy_test_create_for_polling + gvproxy_test_inject_runtime_error (test-only //export) — hermetic Rust integration tests
  • 22 tests across 3 layers

Out of scope (follow-up)

  • Rust-side background poll consumer (tokio task that calls gvproxy_poll_runtime_error every ~250ms → writes into box log as structured Network error). Framework + FFI are ready; the consumer wiring is a separate PR.
  • Audit mitm.go / forked_tcp.go for similar silent-goroutine patterns now that the framework exists.

Layer 1 toggle table (full)

toggle (revert the contract) which test goes red
s.initErr <- errs.initErr <- nil in Init() error path Init_ErrorWrapsSourceAndUnblocks
Remove <-s.initErr blocking WaitInit_BlocksUntilInitCalled
Runtime(nil) enqueues empty event Runtime_NilIsNoop
Change channel from FIFO to LIFO Runtime_EnqueuesAndPollDrainsFIFO
Remove atomic.AddInt64(&s.dropped, 1) Runtime_FullQueueDropsAndIncrementsCounter
Replace select default: with blocking send Runtime_ProducerNeverBlocksUnderBackpressure
Drop When field from String() render RuntimeError_StringIncludesSourceTimestampAndCause

🤖 Generated with Claude Code

Update — Rust consumer + MITM cert-gen wire (commit c781a8f5)

The framework is now externally observable end-to-end. The two pieces previously listed as "out of scope" are in:

Rust-side poll consumer (src/boxlite/src/net/gvproxy/)

start_runtime_error_polling tokio task spawned alongside the existing start_stats_logging in GvproxyBackend::new. Polls gvproxy_poll_runtime_error every 250ms, drains every queued event per cycle, routes each into tracing::warn! with target gvproxy.runtime_error + instance_id + event fields.

3 Rust tests (require --features gvproxy):

  • ffi_poll_runtime_error_round_trips
  • poll_and_route_once_drains_and_emits_tracing_per_event — inject 3, assert all 3 in captured tracing output
  • poll_and_route_once_on_empty_queue_emits_nothing — steady-state no-noise pin

Toggle: short-circuiting poll_and_route_once to return 0 reds the drains test with left: 0 right: 3.

MITM cert-gen wire (mitm.go / mitm_proxy.go)

The security-shaped silent failure (persistent cert-gen failure = MITM silently disabled for hostname) now feeds the sink. BoxCA gains an optional errSink + SetErrSink setter; reportCertGenFailure(hostname, cause) pushes source="mitm.GenerateHostCert" events whose cause names the failing hostname.

3 Go tests in mitm_silent_audit_test.go (file header has the full audit table of all 6 silent sites + per-site YES/NO classification + rationale for each — only cert-gen wired this PR, per-conn TLS/dial/WS warnings stay log-only as documented):

  • TestBoxCA_ReportCertGenFailure_NoSink_IsNoop
  • TestBoxCA_ReportCertGenFailure_WithSink_RoutesToRuntime
  • TestBoxCA_SetErrSink_NilIsHonored

Toggle: short-circuiting reportCertGenFailure reds RoutesToRuntime.

Test totals now

  • 22 Go tests (9 framework + 5 wire sites + 3 Rust FFI helpers + 3 mitm + 2 negative controls)
  • 7 Rust tests (4 FFI/version/inject/destroy + 3 polling)
  • 29 total; 14 two-side toggle red'd across this PR's history

@G4614 G4614 force-pushed the fix/lifecycle-exit-non-zero-on-rest-errors branch from a8fe41c to b94f472 Compare June 1, 2026 08:55
boxlite-ai#612 surfaced `virtualnetwork.New` bind errors out of the gvproxy goroutine
via an ad-hoc `initErr` channel. Four sibling failure sites still fell into
`logrus.Error + return` and went unobserved by the Rust runtime:

  main.go:454  OverrideTCPHandler           — allow_net / MITM silently disabled
  main.go:475  transport.AcceptVfkit        — VFKit transport silently dead (macOS)
  main.go:484  vn.AcceptVfkit               — VFKit protocol silently dead
  main.go:496  listener.Accept              — Qemu transport silently dead (Linux)
  main.go:510  vn.AcceptQemu                — Qemu protocol silently dead

OverrideTCPHandler in particular is a security-shaped bug: when allow_net is
configured but OverrideTCPHandler fails, the box appears healthy AND traffic
bypasses the filter. The Accept failures are boxlite-ai#612's symptom on a different
code path: the VM "connects" but the protocol pump never runs, so 20s later
the guest reports "DNS lookup … i/o timeout" or every TCP RST.

This change introduces `ErrSink` as the single propagation entry point for
all five sites:

  error_sink.go (new):
    - ErrSink with two channels: initErr (cap 1) + runtimeErr (cap 16)
    - Init(source, err) / WaitInit() pair — replaces boxlite-ai#612's ad-hoc channel
    - Runtime(source, err) / PollRuntime() pair — non-blocking on the
      producer side (queue-full drops with `runtime_err_dropped` warn +
      atomic counter); the Rust runtime polls via the new FFI export

  main.go:
    - GvproxyInstance gains `errSink *ErrSink` (init at instance creation)
    - The init goroutine now:
        sink.Init("virtualnetwork.New", err)    if New fails
        sink.Init("OverrideTCPHandler", err)    if override fails  ← MOVED
        sink.Init("", nil)                      when all init succeeds
      OverrideTCPHandler MOVED into init phase: a failure here used to be
      silent and post-init; now it blocks gvproxy_create from returning,
      so the box.create call fails-fast instead of producing a half-baked
      sinkhole.
    - The four runtime-phase sites now also push to sink.Runtime(...)
      *in addition* to logging — log preserved for backwards-compat with
      existing field grep filters, sink call adds the structured surface.

  gvproxy_poll_runtime_error (new //export):
    - Returns the oldest unread runtime error as a heap-allocated C string
      (`gvproxy_free_string` to release), or nil if queue empty.
    - Intended for a Rust-side background task polling ~250ms; out of scope
      for this PR (Go-side framework only — Rust consumer is follow-up).

Tests (error_sink_test.go, 9 tests):
  - Init contract: nil success, error wrapping with source, WaitInit blocks
    until Init is called
  - Runtime contract: nil no-op, FIFO drain via PollRuntime, queue-full
    drop counts, producer-never-blocks under backpressure, concurrent
    producers safe (observed + dropped = sent)
  - Render contract: RuntimeError.String contains source + cause +
    RFC3339 timestamp (so the operator-visible format is pinned)

Two-sided context: the sink-wiring sites in main.go can't be unit-tested
without invoking the cgo `gvproxy_create` (real Unix-socket path + VM
transport). That integration belongs in the Rust-side consumer PR. The
ErrSink framework itself is fully two-sided in the unit tests above —
each one trivially red'd by reverting the corresponding ErrSink line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 force-pushed the fix/lifecycle-exit-non-zero-on-rest-errors branch from 747e75a to 0314c73 Compare June 3, 2026 09:59
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6c04eeb3-4b85-4cc2-8a93-bdeebc69143c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@G4614 G4614 changed the title test(cli): lock in lifecycle exit codes on REST errors (POL-36 hardening; could not reproduce) feat(net): ErrSink framework — propagate the 4 silent gvproxy goroutine errors #612 didn't cover Jun 3, 2026
@cla-assistant
Copy link
Copy Markdown

cla-assistant Bot commented Jun 3, 2026

CLA assistant check
All committers have signed the CLA.

G4614 and others added 3 commits June 3, 2026 10:14
…emu loop

Adds the "不修真死" half of boxlite-ai#634's claim — direct two-sided proof that
the ErrSink framework actually surfaces what would otherwise be silent,
both at the Rust-FFI layer (via poll) and at one production wire site
(Linux Qemu Accept).

Rust-side (libgvproxy-sys/src/lib.rs, 3 tests):
  - poll_runtime_error_round_trips_through_ffi — full path validation:
    create fixture → inject error → poll → assert source+cause+timestamp
    → drain → destroy. Pins the operator-facing rendered format.
  - poll_runtime_error_preserves_fifo_through_ffi — 3 events in order
    via FFI (no Go-side reordering).
  - poll_runtime_error_unknown_id_returns_null — defensive: unknown id
    or already-destroyed id MUST return NULL, never panic. This is the
    contract the always-on Rust background poller depends on.

  Toggle proof: replacing `re := instance.errSink.PollRuntime()` with
  `re := (*RuntimeError)(nil)` reds 2/3 of these (round_trips +
  preserves_fifo); unknown_id stays green because that path returns
  NULL by design. Confirms the poll line is load-bearing for the FFI
  contract.

Go-side wire extraction (qemu_accept_loop.go, 2 tests):
  - Extracted the inline Linux Qemu Accept goroutine into
    runQemuAcceptLoop(ctx, id, listener, vn, sink). Same body — now
    testable against a real Unix socket + cancellable ctx.
  - main.go now reads `go runQemuAcceptLoop(ctx, id, listener, vn, sink)`
    instead of the inline goroutine; behavior identical.
  - TestRunQemuAcceptLoop_ListenerCloseSurfacesViaSink — production-shape
    proof: open real Unix listener, spawn loop, close listener mid-Accept,
    assert sink got source="listener.Accept" and cause="use of closed
    network connection". This IS the silent error path the larger PR
    rewires; the test would never have observed it pre-fix.
  - TestRunQemuAcceptLoop_CancelDoesNotSinkRuntime — negative control:
    ctx cancellation (planned shutdown) must NOT push to sink, else
    every gvproxy_destroy would pollute the runtime error queue with
    self-inflicted noise.

  Toggle proof: removing `sink.Runtime("listener.Accept", err)` from
  runQemuAcceptLoop reds the first test with:
    "expected listener.Accept runtime error in sink, got nil — pre-fix
     silent return: the goroutine logged the error and vanished without
     anyone (including the Rust runtime) ever observing it"
  Restored → green. This is the "不修真死" demonstration for ONE of the
  5 silent sites; the other 4 (transport.AcceptVfkit, vn.AcceptVfkit,
  vn.AcceptQemu, OverrideTCPHandler) share the same pattern and would
  red the same way.

Test-only FFI exports (main.go):
  - gvproxy_test_create_for_polling — minimal fixture instance (no
    socket, no goroutines, just an ErrSink) for Rust integration tests.
  - gvproxy_test_inject_runtime_error(id, source, message) — pushes a
    synthetic event onto the sink so Rust tests can drive injection
    without standing up a real VM transport.
  Both are exported (live in `//export` blocks) but only meaningful in
  test contexts; harmless in production because nothing else calls them.

Total: +3 Rust integration tests, +2 Go integration tests, +2 //export
helpers, +1 extracted helper file. All 14 tests green; 3 toggles
two-side red'd; restored → green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…routines

Completes the "5/5 silent sites have unit-level two-side proof" claim
on top of the previous Layer-3 demo (which only covered listener.Accept).

Extraction (3 new helper files):
  - qemu_accept_loop.go now takes `acceptQemu qemuProtocolFn` (function
    value) instead of `*virtualnetwork.VirtualNetwork`, so tests can mock
    the protocol layer without standing up real virtualnetwork.New.
  - vfkit_accept_loop.go (new) — same shape for macOS: `acceptVfkitTransport
    vfkitTransportFn` + `acceptVfkit vfkitProtocolFn`. Compiles on Linux too;
    only invoked on GOOS=="darwin" in production but tests run everywhere.
  - install_tcp_override.go (new) — wraps the OverrideTCPHandler call
    so the init goroutine in main.go (which still runs inline) can route
    failures through a single point. `installFn func() error` injection
    lets tests force the failure shape.

main.go now reads:
    go runQemuAcceptLoop(ctx, id, listener, vn.AcceptQemu, sink)
    go runVfkitAcceptLoop(ctx, id, conn.(*net.UnixConn),
                          transport.AcceptVfkit, vn.AcceptVfkit, sink)
    err := installTCPOverride(hasTCPFilter, installTCPHandler)
Same behavior end-to-end; the inline goroutines are 30 lines smaller.

Test coverage delta (10 new test functions, +400 lines):

  qemu_accept_loop_test.go (+1):
    - ProtocolErrorSurfacesViaSink: real listener + real dialer (so Accept
      succeeds), mock acceptQemu returns error → sink.PollRuntime gets
      source="vn.AcceptQemu" + errors.Is(re.Err, mockErr).

  vfkit_accept_loop_test.go (new file, 3 tests):
    - TransportErrorSurfacesViaSink: mock acceptVfkitTransport returns
      error → sink gets source="transport.AcceptVfkit".
    - ProtocolErrorSurfacesViaSink: net.Pipe + successful mock transport
      + failing mock acceptVfkit → sink gets source="vn.AcceptVfkit".
    - CancelDoesNotSinkRuntime: planned shutdown via ctx cancel, mock
      transport blocks on ctx.Done then errors → sink stays empty.

  install_tcp_override_test.go (new file, 4 tests):
    - SkipsWhenNoFilterConfigured: hasTCPFilter=false → installFn NOT
      called (the production guard pin).
    - CallsWhenFilterConfigured: hasTCPFilter=true → installFn IS called.
    - PropagatesError: installFn returns mockErr → installTCPOverride
      returns the same wrapped error (the security-shaped guarantee).
    - EmptyAllowNetReturnsNil: newTCPFilterFromConfig with empty/nil
      AllowNet → nil (MITM-only mode).

Two-sided verification (toggle red'd):

  | toggle | red message |
  |---|---|
  | qemu_accept_loop.go: remove `sink.Runtime("vn.AcceptQemu", err)` | `ProtocolErrorSurfacesViaSink`: "expected vn.AcceptQemu runtime error in sink, got nil — pre-fix silent return: protocol pump errored but no signal reached anyone (Rust runtime, operator, ops dashboard)" |
  | vfkit_accept_loop.go: remove `sink.Runtime("transport.AcceptVfkit", err)` | `TransportErrorSurfacesViaSink`: "expected transport.AcceptVfkit runtime error in sink, got nil — pre-fix silent return: gvproxy alive but VFKit transport dead, guest fails 20s later as DNS/network unreachable (same shape as boxlite-ai#612)" |
  | vfkit_accept_loop.go: remove `sink.Runtime("vn.AcceptVfkit", err)` | `ProtocolErrorSurfacesViaSink` (Vfkit): "expected vn.AcceptVfkit runtime error in sink, got nil — pre-fix silent return: VFKit protocol pump died, guest TCP connections RST silently" |
  | install_tcp_override.go: swallow `installFn()` return | `PropagatesError`: "expected error to propagate; got nil — pre-fix silent install: OverrideTCPHandler error swallowed, allow_net/MITM silently disabled, box.create reports SUCCESS" |

All 4 reds restore to green when the toggle is reverted. Combined with the
prior listener.Accept toggle (in the previous commit), this gives 5/5
unit-level proof that each silent site, pre-fix, would have stayed silent.

Test totals after this commit (Go side):
  - 9  error_sink_test.go     (framework contracts)
  - 3  qemu_accept_loop_test.go (listener.Accept + vn.AcceptQemu + cancel control)
  - 3  vfkit_accept_loop_test.go (transport + protocol + cancel control)
  - 4  install_tcp_override_test.go
  --- ---
  19  Go tests (+ 3 Rust FFI integration tests = 22 total)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…k loop

Two complementary additions to make boxlite-ai#634's ErrSink framework fully
observable from the Rust runtime end-to-end:

PART 1: Rust-side poll consumer (src/boxlite/src/net/gvproxy/)
─────────────────────────────────────────────────────────────────────

Pre-this-commit: ErrSink + the 5 silent-goroutine wirings + FFI export
landed in boxlite-ai#634, but no Rust consumer polled `gvproxy_poll_runtime_error`.
Events queued in Go's bounded `runtimeErr` channel (cap 16) and
eventually dropped when `gvproxy_destroy` ran — externally invisible.

This commit:

  ffi.rs:
    - poll_runtime_error(id) -> BoxliteResult<Option<String>>
      Safe wrapper around the FFI; handles CString → Rust conversion
      + frees the C string. None on empty queue, Some(rendered) on
      event, Err only on UTF-8 decode failure (effectively never).

  instance.rs:
    - start_runtime_error_polling(weak instance) spawns a tokio task
      analogous to start_stats_logging. Polls every 250ms (see
      RUNTIME_ERROR_POLL_INTERVAL doc for the cache-window rationale).
      Drains every queued event per cycle so a burst delivers in one
      tick rather than one-per-tick.
    - poll_and_route_once(id) is the pollable body extracted from the
      spawn loop so tests can drive it without a real tokio loop.
      Routes events into tracing::warn! with target
      "gvproxy.runtime_error" + instance_id + event fields.

  mod.rs:
    - Wires start_runtime_error_polling alongside start_stats_logging
      at GvproxyBackend construction time.

Tests (3 in src/boxlite/src/net/gvproxy/instance.rs, feature=gvproxy):
  - ffi_poll_runtime_error_round_trips — Rust ffi wrapper round-trips
    empty → injected event → drained, asserts source/cause/timestamp.
  - poll_and_route_once_drains_and_emits_tracing_per_event — inject 3
    events, run consumer once, assert (a) returned count = 3, (b) each
    event's source + cause appears in captured tracing output, (c)
    "gvproxy.runtime_error" target is present, (d) instance_id field
    is set for correlation.
  - poll_and_route_once_on_empty_queue_emits_nothing — steady-state
    common case: empty queue → 0 emits → no tracing noise.

Two-sided: short-circuiting poll_and_route_once to `return 0` (instead
of running the drain loop) reds the drains test with `left: 0 right: 3`.
Restored → green.

PART 2: MITM cert-gen wire + silent audit (gvproxy-bridge/mitm{,_proxy}.go)
─────────────────────────────────────────────────────────────────────

mitm.go BoxCA + mitm_proxy.go cert-gen path is the security-shaped silent
failure the larger boxlite-ai#634 framework hadn't touched. A persistent
GenerateHostCert failure means MITM is silently disabled for that
hostname, the guest sees a TLS handshake error, and the only signal
was a `logrus.Error` line — invisible to Rust + operators.

mitm.go:
  - BoxCA gains an optional `errSink *ErrSink` field + `SetErrSink`
    setter. Optional so existing tests / pre-init paths that construct
    a BoxCA without a sink keep working.
  - `reportCertGenFailure(hostname, cause)` is the small testable
    helper that pushes onto the sink with source="mitm.GenerateHostCert"
    and a cause string that names the failing hostname (operators
    chasing a silent MITM regression need to know WHICH host couldn't
    be intercepted).

mitm_proxy.go (line 22):
  - The pre-existing `logrus.Error + close conn` path now also calls
    `ca.reportCertGenFailure(hostname, err)`. logrus is preserved for
    backwards-compat with existing log greps; the sink call adds the
    structured surface that the Rust polling consumer routes to
    tracing.

main.go:
  - After NewBoxCAFromPEM, the new `ca.SetErrSink(instance.errSink)`
    wires the per-instance sink into the CA so cert-gen failures land
    in the same queue as the gvproxy goroutine errors.

Tests (3 in mitm_silent_audit_test.go):
  - TestBoxCA_ReportCertGenFailure_NoSink_IsNoop — safe to call
    pre-SetErrSink without panic.
  - TestBoxCA_ReportCertGenFailure_WithSink_RoutesToRuntime — the
    production wiring contract: with sink set, the call pushes a
    source="mitm.GenerateHostCert" event whose cause names the
    failing hostname and wraps the original error.
  - TestBoxCA_SetErrSink_NilIsHonored — passing nil reverts to no-op
    behavior.

The test file header contains the full audit of all 6 silent sites in
mitm.go / mitm_proxy.go / mitm_websocket.go with the per-site
classification (YES → wire to sink, NO → per-conn legitimate log) and
rationale. Only the cert-gen site is wired in this commit; per-conn
TLS handshake / upstream dial / response read / websocket hijack
errors are documented as "expected failure modes, not silent
regressions worth surfacing" and stay log-only.

Two-sided: short-circuiting reportCertGenFailure to `return` (before
the sink push) reds RoutesToRuntime with "pre-fix silent: MITM cert
gen failed, only logrus saw it. The Rust runtime + box log file never
observed the failure". Restored → green.

Test totals after this commit:
  - Go: 19 (prior) + 3 mitm = 22
  - Rust: 4 (FFI/version/inject/destroy) + 3 polling = 7
  ── ──
  29 total tests; 14 two-side toggle red'd across this PR's history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant