Skip to content

🐛 Recover and report panics in provider subprocesses#6939

Merged
vjeffrey merged 3 commits intomainfrom
worktree-providerpanics
Mar 17, 2026
Merged

🐛 Recover and report panics in provider subprocesses#6939
vjeffrey merged 3 commits intomainfrom
worktree-providerpanics

Conversation

@vjeffrey
Copy link
Copy Markdown
Contributor

Summary

  • Adds panic recovery (defer recover) to all provider-side gRPC server methods (GetData, Connect, StoreData, etc.) in the plugin SDK
  • Recovered panics are logged with full stack traces on the provider side and returned as gRPC Internal errors with the panic message + stack trace
  • The main process (handlePluginError) now handles gRPC code 13 (Internal) to surface provider panic details in query results instead of a generic "provider crashed" message
  • The provider subprocess stays alive after a recovered panic, preserving connections and state

Problem

When a panic occurred in provider code (e.g., nil pointer dereference in AWS/Azure/GCP resource creation), the provider subprocess would crash silently. The main mql process only detected this via gRPC status code 14 (Unavailable), losing all panic details and stack traces. This made debugging provider issues very difficult.

How it works

  1. recoverPanic() helper in providers-sdk/v1/plugin/grpc.go catches panics, logs them, and converts them to gRPC Internal errors
  2. handlePluginError() in providers/runtime.go detects the new Internal error code and treats it as a recoverable error (returns handled=true), surfacing the panic details in query results
  3. Unlike a crash (code 14), the provider remains running after a recovered panic

Test plan

  • Unit tests for recoverPanic (string panics, nil pointer panics, no-panic case)
  • Manual verification: introduce a deliberate panic in a provider resource and confirm the error surfaces with stack trace instead of crashing

🤖 Generated with Claude Code

Copy link
Copy Markdown

@mondoo-code-review mondoo-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provider subprocess panics are now caught and reported as errors instead of crashing the process.

@mondoo-code-review mondoo-code-review bot dismissed their stale review March 16, 2026 16:39

Superseded by new review

Copy link
Copy Markdown

@mondoo-code-review mondoo-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provider subprocess panics are now recovered and reported as errors instead of crashing the process.

Additional findings (file/line not in diff):

  • 🟡 providers-sdk/v1/plugin/grpc.go:111Heartbeat is the only GRPCServer method missing the recoverPanic guard. A panic in Heartbeat would still crash the provider subprocess. Add defer recoverPanic("Heartbeat", &err) for consistency.

Copy link
Copy Markdown

@mondoo-code-review mondoo-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provider panic recovery now keeps stack traces local and guards the Internal code check, addressing prior review feedback cleanly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Test Results

5 481 tests  +4   5 477 ✅ +4   2m 7s ⏱️ +3s
  411 suites ±0       4 💤 ±0 
   31 files   ±0       0 ❌ ±0 

Results for commit fe426d0. ± Comparison against base commit b298954.

♻️ This comment has been updated with latest results.

vjeffrey and others added 3 commits March 16, 2026 16:07
Provider panics (e.g., nil pointer dereferences in resource creation)
previously crashed the subprocess silently — the main process only saw
a generic gRPC "Unavailable" error with no panic details or stack trace.

This adds panic recovery to all provider-side gRPC server methods.
Panics are now caught, logged with full stack traces, and returned as
gRPC Internal errors so the main process can surface actionable info
instead of just "provider crashed".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace magic numbers 13/14 with codes.Internal/codes.Unavailable
- Keep full stack trace in log only; user-facing error shows just the
  panic value (first line)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Only send short panic message over gRPC wire; full stack trace
  stays in provider-side logs
- Guard codes.Internal case with "panic in provider " prefix check
  so non-panic Internal errors fall through to default handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vjeffrey vjeffrey force-pushed the worktree-providerpanics branch from cd03b06 to fe426d0 Compare March 16, 2026 22:07
@vjeffrey vjeffrey merged commit aa921d4 into main Mar 17, 2026
22 checks passed
@vjeffrey vjeffrey deleted the worktree-providerpanics branch March 17, 2026 13:00
@github-actions github-actions bot locked and limited conversation to collaborators Mar 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant