Skip to content

feat(grpc): add healthz support#5218

Draft
josedonizetti wants to merge 1 commit intoaquasecurity:mainfrom
josedonizetti:add-grpc-healthz
Draft

feat(grpc): add healthz support#5218
josedonizetti wants to merge 1 commit intoaquasecurity:mainfrom
josedonizetti:add-grpc-healthz

Conversation

@josedonizetti
Copy link
Copy Markdown
Contributor

@josedonizetti josedonizetti commented Jan 28, 2026

Fix #5217

Add gRPC Health Checking Service

This PR adds support for the standard gRPC health checking protocol (grpc.health.v1) to Tracee's gRPC server, enabling Kubernetes gRPC probes for health checks.

Changes

  • gRPC Health Service: Implemented HealthService that wraps the standard gRPC health server and integrates with Tracee's existing heartbeat mechanism
  • Conditional Enablement: gRPC health service is only enabled when --server healthz flag is passed (same behavior as HTTP healthz endpoint)
  • Heartbeat Integration: Health status is determined by polling the heartbeat status every 500ms, transitioning between SERVING and NOT_SERVING based on heartbeat liveness
  • Helm Chart Updates: Updated Helm chart to automatically use gRPC probe when gRPC server is configured (priority over HTTP probe), falling back to HTTP probe when only HTTP server is configured
  • Tests: Added comprehensive tests for the gRPC health service including status transitions, shutdown handling, and Watch streaming

Technical Details

  • Health service starts as NOT_SERVING and transitions to SERVING once heartbeat confirms health
  • Uses empty service name ("") for overall server health, which is sufficient for Kubernetes gRPC probes
  • Health monitor runs in a separate goroutine and gracefully handles context cancellation
  • Moved InvokeHeartbeat function to shared pkg/server/heartbeat.go to be accessible by both HTTP and gRPC servers

Testing

  • Unit tests added for health service functionality
  • Tested in minikube cluster with gRPC probe configuration
  • Verified pod readiness transitions correctly based on heartbeat status

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 77.77778% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.38%. Comparing base (235daa0) to head (7e7e24c).
⚠️ Report is 169 commits behind head on main.

Files with missing lines Patch % Lines
pkg/cmd/tracee.go 0.00% 11 Missing ⚠️
pkg/server/heartbeat.go 0.00% 2 Missing ⚠️
pkg/ebpf/probes/probe_group.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5218      +/-   ##
==========================================
+ Coverage   33.51%   35.38%   +1.87%     
==========================================
  Files         250      241       -9     
  Lines       28908    31681    +2773     
==========================================
+ Hits         9688    11211    +1523     
- Misses      18609    19757    +1148     
- Partials      611      713     +102     
Flag Coverage Δ
unit 35.38% <77.77%> (+1.87%) ⬆️
Files with missing lines Coverage Δ
pkg/cmd/flags/server.go 87.65% <100.00%> (+0.23%) ⬆️
pkg/server/grpc/health.go 100.00% <100.00%> (ø)
pkg/server/grpc/server.go 80.00% <100.00%> (+3.72%) ⬆️
pkg/server/http/server.go 45.34% <ø> (+5.57%) ⬆️
pkg/ebpf/probes/probe_group.go 0.00% <0.00%> (ø)
pkg/server/heartbeat.go 0.00% <0.00%> (ø)
pkg/cmd/tracee.go 0.00% <0.00%> (ø)

... and 111 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@josedonizetti josedonizetti changed the title feat(grpc): add healthz supporflagst feat(grpc): add healthz support Jan 29, 2026
@josedonizetti josedonizetti requested a review from geyslan January 29, 2026 14:25
geyslan
geyslan previously approved these changes Jan 29, 2026
Copy link
Copy Markdown
Member

@geyslan geyslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A single doubt.

// Register health service only if enabled
if s.healthService != nil {
healthpb.RegisterHealthServer(grpcServer, s.healthService.Server())
go s.healthService.StartMonitor(ctx)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a doubt, this go routine spawning could race with the line 84 somehow (inside other spawning)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a doubt, this go routine spawning could race with the line 84 somehow (inside other spawning)?

No race here. The RegisterHealthServer call on line 78 completes synchronously before either goroutine is spawned. After that, the two goroutines operate on independent concerns:

  • StartMonitor only calls health.Server.SetServingStatus(), which is internally synchronized with a mutex in the standard gRPC health server implementation.
  • grpcServer.Serve() starts accepting connections and dispatching RPCs, reading the health status through the same mutex-protected Check/Watch handlers.

So even if Serve starts accepting connections before StartMonitor sets the initial NOT_SERVING status, a health check arriving in that window would get SERVICE_UNKNOWN (the default for unregistered services), which Kubernetes treats as unhealthy — same practical effect as NOT_SERVING.

@josedonizetti josedonizetti marked this pull request as draft February 28, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add gRPC Health Checking Protocol support

3 participants