-
Notifications
You must be signed in to change notification settings - Fork 33
feat: add processingStrategy field in health events structure #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add processingStrategy field in health events structure #608
Conversation
Signed-off-by: Tanisha goyal <[email protected]>
π WalkthroughWalkthroughIntroduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant HM as Health Monitor
participant PCS as Platform Connector Server
participant KC as K8s Connector
participant Store as Data Store
HM->>HM: Create HealthEvent<br/>(processingStrategy set)
HM->>PCS: HealthEventOccurredV1(event)
PCS->>PCS: Normalize UNSPECIFIED<br/>β EXECUTE_REMEDIATION
PCS->>KC: Process via Pipeline
PCS->>Store: Enqueue to Ring Buffer
alt processingStrategy = EXECUTE_REMEDIATION
KC->>KC: filterProcessableEvents()
KC->>KC: Create K8s node conditions
KC->>KC: Create K8s events
KC->>Store: Write conditions & events
else processingStrategy = STORE_ONLY
KC->>KC: Skip (observability only)
Note over KC: Event stored but<br/>no remediation
end
Store->>Store: Insert/Update with strategy
Estimated code review effortπ― 4 (Complex) | β±οΈ ~60 minutes The PR introduces a multi-service feature spanning protocol buffers, event transformation, filtering logic, and storage layer with heterogeneous changes across ~20 files. While individual changes are moderately complex, the breadth of affected components and diverse patterns (filters, builders, normalizations) require careful verification of strategy propagation and backward compatibility guarantees. Poem
π₯ Pre-merge checks | β 2 | β 1β Failed checks (1 warning)
β Passed checks (2 passed)
βοΈ Tip: You can configure your own custom pre-merge checks in the settings. β¨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
π§Ή Nitpick comments (1)
platform-connectors/pkg/connectors/kubernetes/process_node_events.go (1)
325-343: Consider log level and function signature simplification.Two optional refinements:
Log level: If STORE_ONLY events are frequent, INFO-level logging (line 331) may be verbose. Consider DEBUG level for routine filtering operations.
Function signature: The function takes
*protos.HealthEventsbut only uses theEventsfield. Consider accepting[]*protos.HealthEventdirectly for a clearer contract.π Proposed signature simplification
-func filterProcessableEvents(healthEvents *protos.HealthEvents) []*protos.HealthEvent { +func filterProcessableEvents(healthEvents []*protos.HealthEvent) []*protos.HealthEvent { var processableEvents []*protos.HealthEvent - for _, healthEvent := range healthEvents.Events { + for _, healthEvent := range healthEvents { if healthEvent.ProcessingStrategy == protos.ProcessingStrategy_STORE_ONLY { slog.Info("Skipping STORE_ONLY health event (no node conditions / node events)", "node", healthEvent.NodeName, "checkName", healthEvent.CheckName, "agent", healthEvent.Agent) continue } processableEvents = append(processableEvents, healthEvent) } return processableEvents }Then update the call site:
func (r *K8sConnector) processHealthEvents(ctx context.Context, healthEvents *protos.HealthEvents) error { - processableEvents := filterProcessableEvents(healthEvents) + processableEvents := filterProcessableEvents(healthEvents.Events)
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
β Files ignored due to path filters (1)
data-models/pkg/protos/health_event.pb.gois excluded by!**/*.pb.go
π Files selected for processing (11)
data-models/protobufs/health_event.protoevent-exporter/pkg/transformer/cloudevents.goevent-exporter/pkg/transformer/cloudevents_test.gohealth-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyhealth-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyiplatform-connectors/pkg/connectors/kubernetes/k8s_platform_connector_test.goplatform-connectors/pkg/connectors/kubernetes/process_node_events.gostore-client/pkg/client/mongodb_pipeline_builder.gostore-client/pkg/client/pipeline_builder.gostore-client/pkg/client/pipeline_builder_test.gostore-client/pkg/client/postgresql_pipeline_builder.go
π§° Additional context used
π Path-based instructions (4)
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
store-client/pkg/client/pipeline_builder.goevent-exporter/pkg/transformer/cloudevents_test.gostore-client/pkg/client/postgresql_pipeline_builder.goevent-exporter/pkg/transformer/cloudevents.goplatform-connectors/pkg/connectors/kubernetes/k8s_platform_connector_test.goplatform-connectors/pkg/connectors/kubernetes/process_node_events.gostore-client/pkg/client/pipeline_builder_test.gostore-client/pkg/client/mongodb_pipeline_builder.go
**/*_test.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*_test.go: Useenvtestfor testing Kubernetes controllers instead of fake clients
Usetestify/assertandtestify/requirefor assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format:TestFunctionName_Scenario_ExpectedBehavior
Files:
event-exporter/pkg/transformer/cloudevents_test.goplatform-connectors/pkg/connectors/kubernetes/k8s_platform_connector_test.gostore-client/pkg/client/pipeline_builder_test.go
data-models/protobufs/**/*.proto
π CodeRabbit inference engine (.github/copilot-instructions.md)
data-models/protobufs/**/*.proto: Define Protocol Buffer messages indata-models/protobufs/directory
Use semantic versioning for breaking changes in Protocol Buffer messages
Include comprehensive comments for all fields in Protocol Buffer messages
Files:
data-models/protobufs/health_event.proto
**/*.py
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: Use Poetry for dependency management in Python code
Follow PEP 8 style guide for Python code
Use Black for formatting Python code
Type hints required for all functions in Python code
Use dataclasses for structured data in Python code
Files:
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.py
π§ Learnings (2)
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Write table-driven tests when testing multiple scenarios in Go
Applied to files:
platform-connectors/pkg/connectors/kubernetes/k8s_platform_connector_test.go
π Learning: 2025-11-10T10:25:19.443Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 248
File: distros/kubernetes/nvsentinel/charts/health-events-analyzer/values.yaml:87-124
Timestamp: 2025-11-10T10:25:19.443Z
Learning: In the NVSentinel health-events-analyzer, the agent filter to exclude analyzer-published events (`"healthevent.agent": {"$ne": "health-events-analyzer"}`) is automatically added as the first stage in getPipelineStages() function in health-events-analyzer/pkg/reconciler/reconciler.go, not in individual rule configurations in values.yaml.
Applied to files:
store-client/pkg/client/mongodb_pipeline_builder.go
𧬠Code graph analysis (7)
event-exporter/pkg/transformer/cloudevents_test.go (2)
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
ProcessingStrategy(14-17)data-models/pkg/protos/health_event.pb.go (5)
ProcessingStrategy(43-43)ProcessingStrategy(72-74)ProcessingStrategy(76-78)ProcessingStrategy(85-87)ProcessingStrategy_STORE_ONLY(47-47)
store-client/pkg/client/postgresql_pipeline_builder.go (2)
store-client/pkg/datastore/types.go (4)
ToPipeline(161-163)D(131-133)E(126-128)A(136-138)data-models/pkg/protos/health_event.pb.go (1)
ProcessingStrategy_EXECUTE_REMEDIATION(46-46)
event-exporter/pkg/transformer/cloudevents.go (2)
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
ProcessingStrategy(14-17)data-models/pkg/protos/health_event.pb.go (4)
ProcessingStrategy(43-43)ProcessingStrategy(72-74)ProcessingStrategy(76-78)ProcessingStrategy(85-87)
platform-connectors/pkg/connectors/kubernetes/process_node_events.go (1)
data-models/pkg/protos/health_event.pb.go (11)
HealthEvents(156-162)HealthEvents(175-175)HealthEvents(190-192)HealthEvent(260-280)HealthEvent(293-293)HealthEvent(308-310)ProcessingStrategy(43-43)ProcessingStrategy(72-74)ProcessingStrategy(76-78)ProcessingStrategy(85-87)ProcessingStrategy_STORE_ONLY(47-47)
store-client/pkg/client/pipeline_builder_test.go (3)
store-client/pkg/client/pipeline_builder.go (1)
PipelineBuilder(26-47)store-client/pkg/client/mongodb_pipeline_builder.go (1)
NewMongoDBPipelineBuilder(29-31)store-client/pkg/client/postgresql_pipeline_builder.go (1)
NewPostgreSQLPipelineBuilder(29-31)
store-client/pkg/client/mongodb_pipeline_builder.go (2)
store-client/pkg/datastore/types.go (4)
ToPipeline(161-163)D(131-133)E(126-128)A(136-138)data-models/pkg/protos/health_event.pb.go (1)
ProcessingStrategy_EXECUTE_REMEDIATION(46-46)
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
data-models/pkg/protos/health_event.pb.go (4)
ProcessingStrategy(43-43)ProcessingStrategy(72-74)ProcessingStrategy(76-78)ProcessingStrategy(85-87)
π Additional comments (13)
event-exporter/pkg/transformer/cloudevents_test.go (1)
69-69: LGTM - Good test coverage for the new field.The test correctly validates that
ProcessingStrategy_STORE_ONLYis serialized as"STORE_ONLY"in the CloudEvent payload. Consider adding an explicit test case for the default value (EXECUTE_REMEDIATION) to ensure both enum values are covered.Also applies to: 106-108
store-client/pkg/client/pipeline_builder_test.go (1)
69-86: LGTM - Test follows established patterns.The test is well-structured, table-driven, and consistent with the existing pipeline builder tests. It correctly validates both MongoDB and PostgreSQL implementations.
event-exporter/pkg/transformer/cloudevents.go (1)
66-66: LGTM - Consistent enum serialization.The
processingStrategyfield is added correctly, following the same pattern asrecommendedActionusing the.String()method for enum-to-string conversion.store-client/pkg/client/pipeline_builder.go (1)
35-38: LGTM - Well-documented interface method.The new
BuildProcessableHealthEventInsertsPipelinemethod is properly documented with clear comments explaining its purpose and use case. The placement in the interface is logical.store-client/pkg/client/postgresql_pipeline_builder.go (1)
119-132: LGTM - Consistent with MongoDB implementation.The PostgreSQL implementation correctly mirrors the MongoDB version. Since both use client-side pipeline filtering, the identical implementation is appropriate.
data-models/protobufs/health_event.proto (2)
32-38: LGTM - Well-designed enum with sensible defaults.The
ProcessingStrategyenum is properly defined with:
EXECUTE_REMEDIATION = 0as the default (preserves existing behavior for events without this field)STORE_ONLY = 1for observability-only events- Comprehensive comments explaining each value's semantics
As per coding guidelines, the comments adequately describe the field behavior.
77-77: LGTM - Additive field addition.Field number 16 is correctly assigned following the sequence. This is a backward-compatible change.
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.py (1)
1-51: Auto-generated protobuf code - no manual review needed.This file is generated by the protocol buffer compiler. The updated descriptor and offsets correctly reflect the new
ProcessingStrategyenum andHealthEvent.processingStrategyfield from the proto definition.store-client/pkg/client/mongodb_pipeline_builder.go (1)
87-100: LGTM - Correct implementation of processable events filter.The pipeline correctly filters for insert operations where
processingStrategyequalsEXECUTE_REMEDIATION. The use ofint32(protos.ProcessingStrategy_EXECUTE_REMEDIATION)is appropriate since protobuf enums are stored as integers. The field pathfullDocument.healthevent.processingstrategyis correctβthe codebase stores all MongoDB document fields in lowercase, as enforced by the unmarshaller which converts JSON tags to lowercase BSON tags.platform-connectors/pkg/connectors/kubernetes/k8s_platform_connector_test.go (1)
1391-1589: Well-structured test for processing strategy feature.The test comprehensively validates the STORE_ONLY vs EXECUTE_REMEDIATION strategy behavior using table-driven tests as recommended in the coding guidelines. Good test isolation with dedicated clientSet and ring buffer per test case, and proper use of testify assertions.
platform-connectors/pkg/connectors/kubernetes/process_node_events.go (2)
345-370: Good refactoring to extract event creation logic.Extracting the Kubernetes event creation into a dedicated function improves code organization and reusability. The implementation correctly uses the receiver methods for message and reason formatting.
372-416: Clean integration of processing strategy filtering.The function correctly filters events by processing strategy before applying node condition and Kubernetes event logic. The flow is clear: filter processable events, update node conditions for processable events, and write node events for processable non-fatal events. This ensures STORE_ONLY events are consistently excluded from remediation actions.
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
14-17: Generated type stubs correctly reflect protobuf schema changes.The ProcessingStrategy enum and processingStrategy field additions follow the established pattern for protobuf-generated Python type stubs. The type annotations and scaffolding are consistent with the existing code structure.
Also applies to: 31-32, 78-78, 104-104, 120-120, 138-138
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
π§Ή Nitpick comments (1)
tests/platform-connector_test.go (1)
53-96: Consider renaming the feature to match test intent.The test function is named
TestPlatformConnectorWithProcessingStrategybut the feature on line 35 is named"TestPlatformConnector". For consistency and clarity, consider naming the feature to include "WithProcessingStrategy" to better reflect what's being tested.π Suggested change
- feature := features.New("TestPlatformConnector"). + feature := features.New("TestPlatformConnectorWithProcessingStrategy"). WithLabel("suite", "platform-connector")
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
π Files selected for processing (6)
distros/kubernetes/nvsentinel/values-tilt-postgresql.yamldocs/postgresql-schema.sqltests/fault_quarantine_test.gotests/helpers/healthevent.gotests/helpers/kube.gotests/platform-connector_test.go
π§° Additional context used
π Path-based instructions (2)
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
tests/helpers/kube.gotests/platform-connector_test.gotests/helpers/healthevent.gotests/fault_quarantine_test.go
**/*_test.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*_test.go: Useenvtestfor testing Kubernetes controllers instead of fake clients
Usetestify/assertandtestify/requirefor assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format:TestFunctionName_Scenario_ExpectedBehavior
Files:
tests/platform-connector_test.gotests/fault_quarantine_test.go
π§ Learnings (7)
π Learning: 2025-12-23T10:34:08.727Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 610
File: platform-connectors/pkg/connectors/kubernetes/process_node_events.go:346-370
Timestamp: 2025-12-23T10:34:08.727Z
Learning: In platform-connectors/pkg/connectors/kubernetes/process_node_events.go, the Event.Type field is intentionally set to healthEvent.CheckName rather than the standard Kubernetes "Normal" or "Warning" values. This is a deliberate design choice for NVSentinel to apply node events with the checkname as the type.
Applied to files:
tests/helpers/kube.gotests/platform-connector_test.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Name Go tests descriptively using format: `TestFunctionName_Scenario_ExpectedBehavior`
Applied to files:
tests/platform-connector_test.go
π Learning: 2025-11-04T06:31:02.147Z
Learnt from: Gyan172004
Repo: NVIDIA/NVSentinel PR: 223
File: platform-connectors/pkg/nodemetadata/processor.go:0-0
Timestamp: 2025-11-04T06:31:02.147Z
Learning: In platform-connectors/pkg/nodemetadata/processor.go, the NewProcessor function does not perform a nil check on the config parameter because the caller is expected to guarantee a non-nil config is provided.
Applied to files:
tests/platform-connector_test.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Use `envtest` for testing Kubernetes controllers instead of fake clients
Applied to files:
tests/platform-connector_test.gotests/fault_quarantine_test.go
π Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.
Applied to files:
tests/platform-connector_test.gotests/fault_quarantine_test.go
π Learning: 2025-11-06T16:18:09.952Z
Learnt from: KaivalyaMDabhadkar
Repo: NVIDIA/NVSentinel PR: 280
File: tests/fault_management_test.go:334-345
Timestamp: 2025-11-06T16:18:09.952Z
Learning: In the fault-quarantine component, the "quarantinedNodeUncordonedManually" annotation is set to the string literal "True" (with uppercase T), defined as the constant QuarantinedNodeUncordonedManuallyAnnotationValue in fault-quarantine/pkg/common/common.go. Tests should compare against "True", not "true".
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Write table-driven tests when testing multiple scenarios in Go
Applied to files:
tests/fault_quarantine_test.go
𧬠Code graph analysis (2)
tests/helpers/healthevent.go (2)
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
ProcessingStrategy(14-17)data-models/pkg/protos/health_event.pb.go (4)
ProcessingStrategy(43-43)ProcessingStrategy(72-74)ProcessingStrategy(76-78)ProcessingStrategy(85-87)
tests/fault_quarantine_test.go (4)
tests/helpers/fault_quarantine.go (4)
QuarantineTestContext(51-54)SetupQuarantineTest(107-112)AssertQuarantineState(315-382)QuarantineAssertion(56-60)tests/helpers/kube.go (1)
SetNodeManagedByNVSentinel(1389-1408)tests/helpers/healthevent.go (3)
NewHealthEvent(60-76)SendHealthEvent(263-275)SendHealthyEvent(277-287)data-models/pkg/protos/health_event.pb.go (2)
ProcessingStrategy_STORE_ONLY(47-47)ProcessingStrategy_EXECUTE_REMEDIATION(46-46)
π Additional comments (4)
tests/helpers/kube.go (1)
387-409: LGTM! Well-designed negative assertion helper.The function correctly uses
require.Neverto assert the absence of events and complements the existingWaitForNodeEventfunction. The implementation follows established patterns in this file.tests/fault_quarantine_test.go (1)
233-295: LGTM! Test correctly validates processing strategy behavior.The test properly exercises both STORE_ONLY and EXECUTE_REMEDIATION strategies with appropriate assertions for quarantine state. The structure follows e2e-framework conventions and test naming guidelines.
tests/helpers/healthevent.go (2)
48-48: LGTM! Field addition follows struct conventions.The ProcessingStrategy field is properly tagged with
omitemptyand uses int type to align with protobuf enum values.
153-156: LGTM! Fluent setter follows established patterns.The WithProcessingStrategy method correctly implements the fluent builder pattern consistent with other setters in this struct.
9445cfd to
272b090
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
π§Ή Nitpick comments (1)
tests/helpers/healthevent.go (1)
48-48: Consider using the proper enum type for type safety.The
ProcessingStrategyfield is declared asint, but the protobuf-generated code definesProcessingStrategyas a proper enum type indata-models/pkg/protos/health_event.pb.go. Using the enum type would provide compile-time validation of valid values and make the code more self-documenting.π Proposed refactor to use the enum type
First, import the protobuf package at the top of the file:
import ( "context" "encoding/json" "errors" "fmt" "io" "log" "net/http" "os" "strings" "sync" "testing" "time" + protos "github.com/NVIDIA/NVSentinel/data-models/pkg/protos" "github.com/stretchr/testify/require" )Then update the field declaration:
- ProcessingStrategy int `json:"processingStrategy,omitempty"` + ProcessingStrategy protos.ProcessingStrategy `json:"processingStrategy,omitempty"`
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
π Files selected for processing (6)
distros/kubernetes/nvsentinel/values-tilt-postgresql.yamldocs/postgresql-schema.sqltests/fault_quarantine_test.gotests/helpers/healthevent.gotests/helpers/kube.gotests/platform-connector_test.go
π§ Files skipped from review as they are similar to previous changes (5)
- docs/postgresql-schema.sql
- tests/platform-connector_test.go
- tests/helpers/kube.go
- distros/kubernetes/nvsentinel/values-tilt-postgresql.yaml
- tests/fault_quarantine_test.go
π§° Additional context used
π Path-based instructions (1)
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
tests/helpers/healthevent.go
𧬠Code graph analysis (1)
tests/helpers/healthevent.go (2)
data-models/pkg/protos/health_event.pb.go (4)
ProcessingStrategy(43-43)ProcessingStrategy(72-74)ProcessingStrategy(76-78)ProcessingStrategy(85-87)health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
ProcessingStrategy(14-17)
π Additional comments (1)
tests/helpers/healthevent.go (1)
153-156: Implementation follows the fluent builder pattern correctly.The method correctly implements the fluent builder pattern consistent with other methods in this file. If the field type is updated to use the proper enum type (as suggested for line 48), remember to update the parameter type accordingly.
Signed-off-by: Tanisha goyal <[email protected]>
272b090 to
69a01e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
π§Ή Nitpick comments (2)
fault-quarantine/pkg/evaluator/rule_evaluator_test.go (1)
218-271: LGTM! Test correctly validates the new processingStrategy field.The test properly expects
processingStrategywith valuefloat64(0)(EXECUTE_REMEDIATION) when not explicitly set on the HealthEvent. This aligns with the PR's addition of the ProcessingStrategy enum and validates the default serialization behavior.Consider adding a test case that explicitly sets
processingStrategytoprotos.ProcessingStrategy_STORE_ONLYto verify the RoundTrip function handles non-default values correctly. However, if other test files (e.g., cloudevents_test.go) already cover different processing strategies, this may be unnecessary.Optional: Example test case for STORE_ONLY strategy
func TestRoundTrip_WithStoreOnlyStrategy(t *testing.T) { eventTime := timestamppb.New(time.Now()) event := &protos.HealthEvent{ Version: 1, Agent: "test-agent", CheckName: "test-check", ProcessingStrategy: protos.ProcessingStrategy_STORE_ONLY, GeneratedTimestamp: eventTime, NodeName: "test-node", } result, err := RoundTrip(event) if err != nil { t.Fatalf("Failed to roundtrip event: %v", err) } if ps, ok := result["processingStrategy"].(float64); !ok || ps != float64(1) { t.Errorf("Expected processingStrategy to be 1 (STORE_ONLY), got %v", result["processingStrategy"]) } }tests/helpers/kube.go (1)
387-389: Consider adding godoc comment for exported function.The coding guidelines specify that function comments are required for all exported Go functions. While this is consistent with other test helpers in the file, adding a brief godoc comment would improve clarity and align with the guidelines.
Example godoc
// EnsureNodeEventNotPresent asserts that a specific node event with the given type and reason // does not exist within NeverWaitTimeout. This is the negative counterpart to WaitForNodeEvent. func EnsureNodeEventNotPresent(ctx context.Context, t *testing.T, c klient.Client, nodeName string, eventType, eventReason string) {Based on coding guidelines, function comments are required for exported Go functions.
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
π Files selected for processing (7)
distros/kubernetes/nvsentinel/values-tilt-postgresql.yamldocs/postgresql-schema.sqlfault-quarantine/pkg/evaluator/rule_evaluator_test.gotests/fault_quarantine_test.gotests/helpers/healthevent.gotests/helpers/kube.gotests/platform-connector_test.go
π§ Files skipped from review as they are similar to previous changes (5)
- docs/postgresql-schema.sql
- tests/helpers/healthevent.go
- tests/platform-connector_test.go
- tests/fault_quarantine_test.go
- distros/kubernetes/nvsentinel/values-tilt-postgresql.yaml
π§° Additional context used
π Path-based instructions (2)
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
tests/helpers/kube.gofault-quarantine/pkg/evaluator/rule_evaluator_test.go
**/*_test.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*_test.go: Useenvtestfor testing Kubernetes controllers instead of fake clients
Usetestify/assertandtestify/requirefor assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format:TestFunctionName_Scenario_ExpectedBehavior
Files:
fault-quarantine/pkg/evaluator/rule_evaluator_test.go
π§ Learnings (3)
π Learning: 2025-12-23T10:34:08.727Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 610
File: platform-connectors/pkg/connectors/kubernetes/process_node_events.go:346-370
Timestamp: 2025-12-23T10:34:08.727Z
Learning: In platform-connectors/pkg/connectors/kubernetes/process_node_events.go, the Event.Type field is intentionally set to healthEvent.CheckName rather than the standard Kubernetes "Normal" or "Warning" values. This is a deliberate design choice for NVSentinel to apply node events with the checkname as the type.
Applied to files:
tests/helpers/kube.gofault-quarantine/pkg/evaluator/rule_evaluator_test.go
π Learning: 2025-12-22T16:16:31.660Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:31.660Z
Learning: In the NVIDIA/NVSentinel repository, prefer not to introduce a dependency on `stretchr/testify` for simple comparison assertions in Go tests. Use standard `testing` package assertions (t.Error, t.Errorf, etc.) for straightforward checks.
Applied to files:
tests/helpers/kube.go
π Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.
Applied to files:
fault-quarantine/pkg/evaluator/rule_evaluator_test.go
platform-connectors/pkg/connectors/kubernetes/k8s_platform_connector_test.go
Outdated
Show resolved
Hide resolved
platform-connectors/pkg/connectors/kubernetes/k8s_platform_connector_test.go
Show resolved
Hide resolved
Signed-off-by: Tanisha goyal <[email protected]>
404bf55 to
b5af994
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents π€
In @tests/fault_quarantine_test.go:
- Around line 298-299: The test currently calls helpers.CheckNodeConditionExists
and helpers.CheckNodeEventExists but ignores their boolean return values; change
both calls to assert their results by wrapping them in require.Eventually (or
another assertion) so the test fails if the condition/event never appearsβe.g.,
call CheckNodeConditionExists(ctx, client, testCtx.NodeName, "SysLogsXIDError",
"SysLogsXIDErrorIsNotHealthy") inside a require.Eventually loop with a timeout
and poll interval and do the same for CheckNodeEventExists(ctx, client,
testCtx.NodeName, "SysLogsXIDError", "XID error found"); ensure you reference
the returned bool from each helper and assert it is true rather than ignoring
it.
π§Ή Nitpick comments (1)
tests/fault_quarantine_test.go (1)
324-328: Redundant healthy event send in teardown.The
SendHealthyEventcall at line 325 is redundant becauseTeardownQuarantineTest(line 327) already sends a healthy event internally as part of its cleanup routine.π Proposed simplification
feature.Teardown(func(ctx context.Context, t *testing.T, c *envconf.Config) context.Context { - helpers.SendHealthyEvent(ctx, t, testCtx.NodeName) - return helpers.TeardownQuarantineTest(ctx, t, c) })
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
π Files selected for processing (4)
fault-quarantine/pkg/initializer/init.goplatform-connectors/pkg/connectors/kubernetes/k8s_platform_connector_test.gotests/fault_quarantine_test.gotests/helpers/kube.go
π§ Files skipped from review as they are similar to previous changes (2)
- platform-connectors/pkg/connectors/kubernetes/k8s_platform_connector_test.go
- tests/helpers/kube.go
π§° Additional context used
π Path-based instructions (2)
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
tests/fault_quarantine_test.gofault-quarantine/pkg/initializer/init.go
**/*_test.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*_test.go: Useenvtestfor testing Kubernetes controllers instead of fake clients
Usetestify/assertandtestify/requirefor assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format:TestFunctionName_Scenario_ExpectedBehavior
Files:
tests/fault_quarantine_test.go
π§ Learnings (5)
π Learning: 2025-11-06T16:18:09.952Z
Learnt from: KaivalyaMDabhadkar
Repo: NVIDIA/NVSentinel PR: 280
File: tests/fault_management_test.go:334-345
Timestamp: 2025-11-06T16:18:09.952Z
Learning: In the fault-quarantine component, the "quarantinedNodeUncordonedManually" annotation is set to the string literal "True" (with uppercase T), defined as the constant QuarantinedNodeUncordonedManuallyAnnotationValue in fault-quarantine/pkg/common/common.go. Tests should compare against "True", not "true".
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Use `envtest` for testing Kubernetes controllers instead of fake clients
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Write table-driven tests when testing multiple scenarios in Go
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-11-10T10:25:19.443Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 248
File: distros/kubernetes/nvsentinel/charts/health-events-analyzer/values.yaml:87-124
Timestamp: 2025-11-10T10:25:19.443Z
Learning: In the NVSentinel health-events-analyzer, the agent filter to exclude analyzer-published events (`"healthevent.agent": {"$ne": "health-events-analyzer"}`) is automatically added as the first stage in getPipelineStages() function in health-events-analyzer/pkg/reconciler/reconciler.go, not in individual rule configurations in values.yaml.
Applied to files:
fault-quarantine/pkg/initializer/init.go
𧬠Code graph analysis (1)
tests/fault_quarantine_test.go (4)
tests/helpers/fault_quarantine.go (5)
QuarantineTestContext(51-54)SetupQuarantineTest(107-112)AssertQuarantineState(315-382)QuarantineAssertion(56-60)TeardownQuarantineTest(181-242)tests/helpers/kube.go (5)
SetNodeManagedByNVSentinel(1389-1408)EnsureNodeConditionNotPresent(1797-1818)EnsureNodeEventNotPresent(387-409)CheckNodeConditionExists(1283-1300)CheckNodeEventExists(1325-1353)tests/helpers/healthevent.go (3)
NewHealthEvent(60-76)SendHealthEvent(263-275)SendHealthyEvent(277-287)data-models/pkg/protos/health_event.pb.go (2)
ProcessingStrategy_STORE_ONLY(47-47)ProcessingStrategy_EXECUTE_REMEDIATION(46-46)
β° Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: ko-build-test (janitor, .)
- GitHub Check: container-build-test (syslog-health-monitor, make -C health-monitors/syslog-health-monitor docker...
- GitHub Check: container-build-test (gpu-health-monitor-dcgm4, make -C health-monitors/gpu-health-monitor docker...
- GitHub Check: container-build-test (file-server-cleanup, make -C log-collector docker-build-file-server-cleanup)
- GitHub Check: container-build-test (gpu-health-monitor-dcgm3, make -C health-monitors/gpu-health-monitor docker...
- GitHub Check: container-build-test (log-collector, make -C log-collector docker-build-log-collector)
- GitHub Check: modules-lint-test (health-events-analyzer)
- GitHub Check: modules-lint-test (fault-remediation)
- GitHub Check: modules-lint-test (fault-quarantine)
- GitHub Check: modules-lint-test (labeler)
- GitHub Check: modules-lint-test (platform-connectors)
- GitHub Check: modules-lint-test (store-client)
- GitHub Check: modules-lint-test (event-exporter)
- GitHub Check: modules-lint-test (node-drainer)
- GitHub Check: modules-lint-test (janitor)
- GitHub Check: health-monitors-lint-test (csp-health-monitor)
- GitHub Check: health-monitors-lint-test (kubernetes-object-monitor)
- GitHub Check: health-monitors-lint-test (syslog-health-monitor)
- GitHub Check: CodeQL PR Analysis
- GitHub Check: E2E Tests (ARM64 + MongoDB)
- GitHub Check: E2E Tests (ARM64 + PostgreSQL)
- GitHub Check: E2E Tests (AMD64 + MongoDB)
- GitHub Check: E2E Tests (AMD64 + PostgreSQL)
π Additional comments (2)
fault-quarantine/pkg/initializer/init.go (1)
66-66: Change is correct; default processingStrategy ensures backward compatibility.The switch to
BuildProcessableHealthEventInsertsPipeline()correctly filters to only process events withprocessingStrategy == EXECUTE_REMEDIATION. This is backward compatible becauseEXECUTE_REMEDIATIONis the default value (0) in the proto definition, so existing events without an explicit processingStrategy field will be processed normally. Events explicitly markedSTORE_ONLYare intentionally ignored by fault-quarantine, as shown in the testTestFaultQuarantineWithProcessingStrategy. The change is scoped to fault-quarantine; event-exporter still usesBuildAllHealthEventInsertsPipeline()to handle all events.tests/fault_quarantine_test.go (1)
26-26: LGTM!The protos import is necessary for accessing the
ProcessingStrategyconstants used in the new test.
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
Signed-off-by: Tanisha goyal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents π€
In @tests/fault_quarantine_test.go:
- Around line 334-343: The teardown is missing removal of the
ManagedByNVSentinel label set in Setup; inside the Teardown func (the anonymous
function with signature func(ctx context.Context, t *testing.T, c
*envconf.Config)), obtain a client via c.NewClient(), assert no error with
require.NoError(t, err), call helpers.RemoveNodeManagedByNVSentinelLabel(ctx,
client, testCtx.NodeName) and assert its error with require.NoError(t, err)
before returning helpers.TeardownQuarantineTest(ctx, t, c) so the label is
cleaned up prior to final teardown.
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
π Files selected for processing (2)
tests/data/managed-by-nvsentinel-configmap.yamltests/fault_quarantine_test.go
π§° Additional context used
π Path-based instructions (2)
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
tests/fault_quarantine_test.go
**/*_test.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*_test.go: Useenvtestfor testing Kubernetes controllers instead of fake clients
Usetestify/assertandtestify/requirefor assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format:TestFunctionName_Scenario_ExpectedBehavior
Files:
tests/fault_quarantine_test.go
π§ Learnings (9)
π Learning: 2025-10-29T15:37:49.210Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 160
File: health-events-analyzer/pkg/reconciler/metrics.go:52-58
Timestamp: 2025-10-29T15:37:49.210Z
Learning: In health-events-analyzer/pkg/reconciler/metrics.go, the ruleMatchedTotal metric should include both "rule_name" and "node_name" labels to identify which rule triggered on which node and how many times, as per the project's observability requirements.
Applied to files:
tests/data/managed-by-nvsentinel-configmap.yaml
π Learning: 2025-12-12T07:41:27.339Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 545
File: tests/data/health-events-analyzer-config.yaml:2190-2251
Timestamp: 2025-12-12T07:41:27.339Z
Learning: In tests/data/health-events-analyzer-config.yaml, the XID74Reg2Bit13Set rule intentionally omits the time window filter; tests should verify only the register bit pattern (bit 13 in REG2) on the incoming XID 74 event and should not rely on historical events or counts of repeats. If adding similar rules elsewhere, apply the same pattern and document that the time window filter is unnecessary for single-event bit checks.
Applied to files:
tests/data/managed-by-nvsentinel-configmap.yaml
π Learning: 2025-11-27T15:42:48.142Z
Learnt from: lalitadithya
Repo: NVIDIA/NVSentinel PR: 459
File: docs/cancelling-breakfix.md:58-90
Timestamp: 2025-11-27T15:42:48.142Z
Learning: The label `k8saas.nvidia.com/ManagedByNVSentinel` exists in NVSentinel and is used to opt nodes out of automated break-fix workflows. The prefix `k8saas.nvidia.com/` is configurable via the `labelPrefix` setting in the fault-quarantine Helm values, and the documentation correctly uses the default value.
Applied to files:
tests/data/managed-by-nvsentinel-configmap.yaml
π Learning: 2025-12-12T07:36:26.109Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 545
File: tests/data/health-events-analyzer-config.yaml:1730-1901
Timestamp: 2025-12-12T07:36:26.109Z
Learning: In NVSentinel health-events analyzer config files (e.g., tests/data/health-events-analyzer-config.yaml and similar), XID 74 errors are guaranteed to include a GPU_UUID entry within healthevent.entitiesimpacted. Therefore, when writing XID 74-specific rules that filter for GPU_UUID entities, you can skip null checks (no $ifNull) for GPU_UUID. Ensure tests relying on this assumption are updated accordingly and document the guarantee in the rule descriptions.
Applied to files:
tests/data/managed-by-nvsentinel-configmap.yaml
π Learning: 2025-12-12T07:38:37.023Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 545
File: tests/data/health-events-analyzer-config.yaml:2025-2187
Timestamp: 2025-12-12T07:38:37.023Z
Learning: In NVSentinel, XID 74 errors always include an NVLINK entry in healthevent.entitiesimpacted, so null-checking with $ifNull is unnecessary when filtering for NVLINK entities in XID 74-specific rules. Apply this rule to YAML test fixtures under tests/ data (e.g., tests/data/health-events-analyzer-config.yaml) and any similar health-event configuration tests. If applying in code, ensure downstream filters rely on the presence of NVLINK in entitiesimpacted for XID 74 only, but continue to guard other fields and XIDs with appropriate null checks.
Applied to files:
tests/data/managed-by-nvsentinel-configmap.yaml
π Learning: 2025-11-06T16:18:09.952Z
Learnt from: KaivalyaMDabhadkar
Repo: NVIDIA/NVSentinel PR: 280
File: tests/fault_management_test.go:334-345
Timestamp: 2025-11-06T16:18:09.952Z
Learning: In the fault-quarantine component, the "quarantinedNodeUncordonedManually" annotation is set to the string literal "True" (with uppercase T), defined as the constant QuarantinedNodeUncordonedManuallyAnnotationValue in fault-quarantine/pkg/common/common.go. Tests should compare against "True", not "true".
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Use `envtest` for testing Kubernetes controllers instead of fake clients
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Write table-driven tests when testing multiple scenarios in Go
Applied to files:
tests/fault_quarantine_test.go
π Additional comments (5)
tests/data/managed-by-nvsentinel-configmap.yaml (1)
43-58: Verified: Thesyslog-health-monitoragent andSysLogsXIDErrorcheckName are valid identifiers in the system and are actively used across test fixtures, e2e tests, and health monitor implementations. The new rule-set correctly applies the same remediation pattern as the existingManagedByNVSentinel-Rulefor handling syslog-based XID errors.tests/fault_quarantine_test.go (4)
26-32: LGTM!The import for the protos package is necessary for accessing the
ProcessingStrategyenum values used in the test assertions.
240-251: LGTM!Setup correctly initializes the test context and enables NVSentinel management for the node under test.
253-289: LGTM!The test correctly validates that health events with
STORE_ONLYprocessing strategy do not trigger node conditions, events, or quarantine actions. The use ofEnsure*helpers with built-in assertions is appropriate.
291-332: LGTM!The test correctly validates that health events with
EXECUTE_REMEDIATIONprocessing strategy trigger the expected node conditions, events, and quarantine actions. This provides good contrast with the STORE_ONLY phase above.
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
89ca1dd to
2056205
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
π€ Fix all issues with AI agents
In @data-models/protobufs/health_event.proto:
- Line 78: Add a field-level comment for the processingStrategy field in the
HealthEvent message: describe what processingStrategy (of enum type
ProcessingStrategy) controls for a HealthEvent (e.g., how the event should be
processed, prioritization/ordering implications, default behavior when unset,
and any effects on downstream consumers). Update the comment immediately above
the line declaring "ProcessingStrategy processingStrategy = 16;" so it clearly
documents the field's purpose, expected values/semantics, and any default or
compatibility considerations.
π§Ή Nitpick comments (1)
health-monitors/csp-health-monitor/pkg/triggerengine/trigger.go (1)
362-364: Hardcoded processing strategy is appropriate; consider issue reference in TODO.The hardcoded
EXECUTE_REMEDIATIONvalue is reasonable for this trigger engine component, as its purpose is to forward maintenance events that should initiate remediation workflows. The TODO comment clearly documents the intent to make this configurable.However, per coding guidelines, TODO comments should reference issues rather than PRs. Consider creating or referencing an issue for the configurability work.
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
β Files ignored due to path filters (1)
data-models/pkg/protos/health_event.pb.gois excluded by!**/*.pb.go
π Files selected for processing (9)
data-models/protobufs/health_event.protohealth-monitors/csp-health-monitor/pkg/triggerengine/trigger.gohealth-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyhealth-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyistore-client/pkg/datastore/providers/postgresql/sql_filter_builder.gotests/data/fatal-health-event.jsontests/data/healthy-event.jsontests/data/unsupported-health-event.jsontests/helpers/healthevent.go
π§ Files skipped from review as they are similar to previous changes (1)
- tests/helpers/healthevent.go
π§° Additional context used
π Path-based instructions (3)
data-models/protobufs/**/*.proto
π CodeRabbit inference engine (.github/copilot-instructions.md)
data-models/protobufs/**/*.proto: Define Protocol Buffer messages indata-models/protobufs/directory
Use semantic versioning for breaking changes in Protocol Buffer messages
Include comprehensive comments for all fields in Protocol Buffer messages
Files:
data-models/protobufs/health_event.proto
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
health-monitors/csp-health-monitor/pkg/triggerengine/trigger.gostore-client/pkg/datastore/providers/postgresql/sql_filter_builder.go
**/*.py
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: Use Poetry for dependency management in Python code
Follow PEP 8 style guide for Python code
Use Black for formatting Python code
Type hints required for all functions in Python code
Use dataclasses for structured data in Python code
Files:
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.py
π§ Learnings (2)
π Learning: 2025-12-23T10:34:13.121Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 610
File: platform-connectors/pkg/connectors/kubernetes/process_node_events.go:346-370
Timestamp: 2025-12-23T10:34:13.121Z
Learning: In platform-connectors/pkg/connectors/kubernetes/process_node_events.go, the Event.Type field is intentionally set to healthEvent.CheckName rather than the standard Kubernetes "Normal" or "Warning" values. This is a deliberate design choice for NVSentinel to apply node events with the checkname as the type.
Applied to files:
health-monitors/csp-health-monitor/pkg/triggerengine/trigger.go
π Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.
Applied to files:
store-client/pkg/datastore/providers/postgresql/sql_filter_builder.go
𧬠Code graph analysis (2)
health-monitors/csp-health-monitor/pkg/triggerengine/trigger.go (2)
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
ProcessingStrategy(14-18)data-models/pkg/protos/health_event.pb.go (5)
ProcessingStrategy(43-43)ProcessingStrategy(75-77)ProcessingStrategy(79-81)ProcessingStrategy(88-90)ProcessingStrategy_EXECUTE_REMEDIATION(47-47)
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
data-models/pkg/protos/health_event.pb.go (4)
ProcessingStrategy(43-43)ProcessingStrategy(75-77)ProcessingStrategy(79-81)ProcessingStrategy(88-90)
π Additional comments (7)
data-models/protobufs/health_event.proto (1)
32-39: Well-designed enum with clear documentation.The
ProcessingStrategyenum follows protobuf best practices withUNSPECIFIED = 0as the default value. The comments clearly explain the behavioral differences between strategies, which will help downstream consumers understand when to use each value.tests/data/unsupported-health-event.json (1)
19-20: LGTM!The JSON is valid and the
processingStrategyvalue of 1 (EXECUTE_REMEDIATION) is appropriate for test data representing an unsupported health event that should trigger remediation.tests/data/fatal-health-event.json (1)
19-20: LGTM!The addition of
processingStrategy: 1(EXECUTE_REMEDIATION) is appropriate for a fatal health event that requires remediation action.store-client/pkg/datastore/providers/postgresql/sql_filter_builder.go (1)
404-404: LGTM!The field name mapping correctly follows the established pattern: lowercase for MongoDB (per the retrieved learnings) and camelCase for PostgreSQL JSONB queries. This ensures that filters on
processingStrategywill work correctly across both database providers.Based on learnings.
tests/data/healthy-event.json (1)
19-20: Verify that EXECUTE_REMEDIATION is the intended strategy for healthy events.While the JSON is valid, using
processingStrategy: 1(EXECUTE_REMEDIATION) for a healthy event warrants verification. Depending on the system design:
- If healthy events should clear quarantines or update cluster state,
EXECUTE_REMEDIATIONis appropriate.- If healthy events are purely observability signals,
STORE_ONLYmight be more suitable.Please confirm this aligns with the intended behavior for healthy events in your system.
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.py (1)
1-51: Generated protobuf file β verify check-in policy.This is auto-generated protobuf code. Line 3 states "NO CHECKED-IN PROTOBUF GENCODE", yet this file is being committed to version control. Please verify whether this aligns with the project's policy for handling generated protobuf files. Some projects regenerate these files during the build process rather than committing them.
If the policy is to commit generated files, the changes look correct and reflect the new
ProcessingStrategyenum andprocessingStrategyfield additions.health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
14-18: Generated type stubs look consistent.This auto-generated type stub file correctly reflects the new
ProcessingStrategyenum andprocessingStrategyfield additions. The type annotations follow standard protobuf Python patterns:
- Enum class with proper metaclass and
__slots__- Module-level constants exported
HealthEventextended with field, constant, and parameter- Type hints correctly use
_Optional[_Union[ProcessingStrategy, str]]No issues identified with the generated code.
Also applies to: 32-34, 80-80, 106-106, 122-122, 140-140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
π§Ή Nitpick comments (1)
tests/helpers/healthevent.go (1)
155-158: LGTM! Consider adding godoc (optional).The
WithProcessingStrategymethod correctly implements the fluent builder pattern used throughout this file.As per coding guidelines, exported functions should have godoc comments. However, this is consistent with all other
With*methods in the file (lines 80-153). Consider adding documentation to all exported methods in a future refactor.
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
β Files ignored due to path filters (1)
data-models/pkg/protos/health_event.pb.gois excluded by!**/*.pb.go
π Files selected for processing (9)
data-models/protobufs/health_event.protohealth-monitors/csp-health-monitor/pkg/triggerengine/trigger.gohealth-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyhealth-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyistore-client/pkg/datastore/providers/postgresql/sql_filter_builder.gotests/data/fatal-health-event.jsontests/data/healthy-event.jsontests/data/unsupported-health-event.jsontests/helpers/healthevent.go
π§ Files skipped from review as they are similar to previous changes (6)
- health-monitors/csp-health-monitor/pkg/triggerengine/trigger.go
- tests/data/unsupported-health-event.json
- tests/data/healthy-event.json
- store-client/pkg/datastore/providers/postgresql/sql_filter_builder.go
- health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.py
- data-models/protobufs/health_event.proto
π§° Additional context used
π Path-based instructions (1)
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
tests/helpers/healthevent.go
π§ Learnings (2)
π Learning: 2025-12-22T16:16:31.660Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:31.660Z
Learning: In the NVIDIA/NVSentinel repository, prefer not to introduce a dependency on `stretchr/testify` for simple comparison assertions in Go tests. Use standard `testing` package assertions (t.Error, t.Errorf, etc.) for straightforward checks.
Applied to files:
tests/helpers/healthevent.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Write table-driven tests when testing multiple scenarios in Go
Applied to files:
tests/helpers/healthevent.go
𧬠Code graph analysis (2)
tests/helpers/healthevent.go (2)
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
ProcessingStrategy(14-18)data-models/pkg/protos/health_event.pb.go (5)
ProcessingStrategy(43-43)ProcessingStrategy(75-77)ProcessingStrategy(79-81)ProcessingStrategy(88-90)ProcessingStrategy_EXECUTE_REMEDIATION(47-47)
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
data-models/pkg/protos/health_event.pb.go (4)
ProcessingStrategy(43-43)ProcessingStrategy(75-77)ProcessingStrategy(79-81)ProcessingStrategy(88-90)
π Additional comments (7)
tests/helpers/healthevent.go (3)
31-31: LGTM!The import of the
protospackage is necessary to access theProcessingStrategyenum for the default initialization.
49-49: LGTM!The
ProcessingStrategyfield follows the established pattern in this struct, usinginttype consistent withRecommendedAction(line 43). Theomitemptytag appropriately handles zero-value serialization.
76-76: LGTM!Initializing
ProcessingStrategytoEXECUTE_REMEDIATIONis a sensible default that maintains backward compatibilityβexisting tests that don't explicitly set this field will continue to expect remediation to execute.tests/data/fatal-health-event.json (1)
19-20: LGTM. The addition ofprocessingStrategywith value 1 correctly representsEXECUTE_REMEDIATIONper the proto definition, and is appropriate for this fatal health event test fixture. JSON syntax is valid.health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (3)
32-34: LGTM!Module-level enum constants follow the standard protobuf pattern and are consistent with the existing
RecommendedActionenum exports.
80-80: LGTM!The
processingStrategyfield is correctly integrated intoHealthEvent:
- Added to
__slots__(line 80)- Field number constant declared (line 106)
- Field typed appropriately (line 122)
- Parameter signature matches the pattern used for
recommendedAction(line 140)Also applies to: 106-106, 122-122, 140-140
14-18: No action required. The type stub is correctly auto-generated from the protobuf definition. TheprocessingStrategyfield andProcessingStrategyenum are properly defined in both the proto source and the generated type stub. The proto documentation clearly explains the intended behavior:EXECUTE_REMEDIATIONfor normal operations andSTORE_ONLYfor observability-only events, withUNSPECIFIEDas the default. Since no filtering logic onprocessingStrategycurrently exists in the Python codebase, backward compatibility is not an issue at this time. Once processing logic is implemented, developers will have the proto documentation to guide proper handling of the enum values.
d432594 to
a4f8335
Compare
Signed-off-by: Tanisha goyal <[email protected]>
a4f8335 to
786239c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
β Files ignored due to path filters (1)
data-models/pkg/protos/health_event.pb.gois excluded by!**/*.pb.go
π Files selected for processing (10)
data-models/protobufs/health_event.protohealth-monitors/csp-health-monitor/pkg/triggerengine/trigger.gohealth-monitors/csp-health-monitor/pkg/triggerengine/trigger_test.gohealth-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyhealth-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyistore-client/pkg/datastore/providers/postgresql/sql_filter_builder.gotests/data/fatal-health-event.jsontests/data/healthy-event.jsontests/data/unsupported-health-event.jsontests/helpers/healthevent.go
π§ Files skipped from review as they are similar to previous changes (7)
- store-client/pkg/datastore/providers/postgresql/sql_filter_builder.go
- tests/data/fatal-health-event.json
- tests/helpers/healthevent.go
- health-monitors/csp-health-monitor/pkg/triggerengine/trigger.go
- health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi
- health-monitors/csp-health-monitor/pkg/triggerengine/trigger_test.go
- tests/data/unsupported-health-event.json
π§° Additional context used
π Path-based instructions (2)
**/*.py
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: Use Poetry for dependency management in Python code
Follow PEP 8 style guide for Python code
Use Black for formatting Python code
Type hints required for all functions in Python code
Use dataclasses for structured data in Python code
Files:
health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.py
data-models/protobufs/**/*.proto
π CodeRabbit inference engine (.github/copilot-instructions.md)
data-models/protobufs/**/*.proto: Define Protocol Buffer messages indata-models/protobufs/directory
Use semantic versioning for breaking changes in Protocol Buffer messages
Include comprehensive comments for all fields in Protocol Buffer messages
Files:
data-models/protobufs/health_event.proto
π§ Learnings (1)
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to data-models/protobufs/**/*.proto : Include comprehensive comments for all fields in Protocol Buffer messages
Applied to files:
data-models/protobufs/health_event.proto
π Additional comments (3)
data-models/protobufs/health_event.proto (1)
32-39: Well-documented enum definition with clear semantics.The
ProcessingStrategyenum is properly defined with:
- Comprehensive comments explaining each member's behavior
UNSPECIFIED = 0following proto3 conventions for default values- Clear distinction between
EXECUTE_REMEDIATION(normal) andSTORE_ONLY(observability-only) modestests/data/healthy-event.json (1)
19-20: Test fixture correctly updated with new field.The addition of
processingStrategy: 1(EXECUTE_REMEDIATION) properly reflects the new protobuf field in the test data. The JSON syntax is valid and the value correctly corresponds to the enum definition.health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.py (1)
24-50: Auto-generated protobuf codeβno manual review required.This file is auto-generated by the protocol buffer compiler (line 2). The changes to serialized descriptors and offset positions are mechanical reflections of the proto schema updates. No manual modifications or detailed review needed.
lalitadithya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some details on how are we going to handle backwards compatibility? What about monitors that do not send the field, how will they be processed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and canβt be posted inline due to platform limitations.
β οΈ Outside diff range comments (3)
docs/designs/025-processing-strategy-for-health-checks.md (3)
415-435: Backward compatibility inconsistency: pipeline builders don't handle missing processingStrategy field.The MongoDB and PostgreSQL pipeline builders (lines 408-435) use strict equality checks for
processingStrategy == "EXECUTE_REMEDIATION", which will exclude historical events that lack this field entirely. However, the reconciler (lines 555β557) correctly handles backward compatibility using an$orcondition that treats missing fields asEXECUTE_REMEDIATION.This inconsistency means old events without
processingStrategywill be silently filtered out byBuildProcessableHealthEventInsertsPipeline()during fault-quarantine initialization, breaking processing for historical data.Both pipeline builders should match the reconciler's backward-compatible approach:
β»οΈ Proposed fix for pipeline builders
// For MongoDB: datastore.E("$or", datastore.A( datastore.D( datastore.E("fullDocument.healthevent.processingstrategy", "EXECUTE_REMEDIATION"), ), datastore.D( datastore.E("fullDocument.healthevent.processingstrategy", datastore.D( datastore.E("$exists", false), )), ), )) // Similar for PostgreSQLβuse $or with existence check for missing field
398-398: Fix duplicate step numbering: Step 5 appears twice.Step 5 is labeled at both line 398 ("Store Client") and line 477 ("Fault Quarantine Module"). The fault-quarantine section should be labeled "Step 6" and subsequent sections renumbered accordingly.
Also applies to: 477-477
323-327: Clarify GPU Health Monitor implementation: document all 4 creation sites.Line 327 notes that GPU Health Monitor creates HealthEvent objects in 4 different places but only shows implementation for one. The design should either:
- Show all 4 locations with the required
processingStrategy=self._processing_strategyaddition, or- Reference a specific file path and line range where developers can find and update all instances.
This reduces ambiguity during implementation.
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
π Files selected for processing (1)
docs/designs/025-processing-strategy-for-health-checks.md
π§° Additional context used
π§ Learnings (3)
π Common learnings
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
π Learning: 2026-01-07T09:54:49.335Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
Applied to files:
docs/designs/025-processing-strategy-for-health-checks.md
π Learning: 2025-11-10T10:25:19.443Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 248
File: distros/kubernetes/nvsentinel/charts/health-events-analyzer/values.yaml:87-124
Timestamp: 2025-11-10T10:25:19.443Z
Learning: In the NVSentinel health-events-analyzer, the agent filter to exclude analyzer-published events (`"healthevent.agent": {"$ne": "health-events-analyzer"}`) is automatically added as the first stage in getPipelineStages() function in health-events-analyzer/pkg/reconciler/reconciler.go, not in individual rule configurations in values.yaml.
Applied to files:
docs/designs/025-processing-strategy-for-health-checks.md
πͺ markdownlint-cli2 (0.18.1)
docs/designs/025-processing-strategy-for-health-checks.md
547-547: Hard tabs
Column: 1
(MD010, no-hard-tabs)
548-548: Hard tabs
Column: 1
(MD010, no-hard-tabs)
549-549: Hard tabs
Column: 1
(MD010, no-hard-tabs)
550-550: Hard tabs
Column: 1
(MD010, no-hard-tabs)
551-551: Hard tabs
Column: 1
(MD010, no-hard-tabs)
552-552: Hard tabs
Column: 1
(MD010, no-hard-tabs)
553-553: Hard tabs
Column: 1
(MD010, no-hard-tabs)
554-554: Hard tabs
Column: 1
(MD010, no-hard-tabs)
555-555: Hard tabs
Column: 1
(MD010, no-hard-tabs)
556-556: Hard tabs
Column: 1
(MD010, no-hard-tabs)
557-557: Hard tabs
Column: 1
(MD010, no-hard-tabs)
558-558: Hard tabs
Column: 1
(MD010, no-hard-tabs)
π Additional comments (2)
docs/designs/025-processing-strategy-for-health-checks.md (2)
101-104: Clarify enum definition: AI summary mentions UNSPECIFIED but design omits it.The protobuf enum in Step 1 defines only
EXECUTE_REMEDIATION = 0andSTORE_ONLY = 1, but the AI summary references a three-value enum:(UNSPECIFIED, EXECUTE_REMEDIATION, STORE_ONLY). Protobuf conventions typically reserve value 0 for an UNSPECIFIED/default state.Confirm whether the enum should include an explicit
UNSPECIFIED = 0value, withEXECUTE_REMEDIATIONshifted to 1, or whether the current design intentionally defaults toEXECUTE_REMEDIATION.
547-558: Replace hard tabs with spaces in code block.Lines 547β558 contain hard tabs instead of spaces, flagged by markdownlint (MD010). Convert to spaces for consistency with markdown standards.
π Proposed fix
- // Backward Compatibility: Use $or to include events where processingstrategy is either - // EXECUTE_REMEDIATION or missing (old events created before this feature was added). - // Old events without this field should be treated as EXECUTE_REMEDIATION. + // Backward Compatibility: Use $or to include events where processingstrategy is either + // EXECUTE_REMEDIATION or missing (old events created before this feature was added). + // Old events without this field should be treated as EXECUTE_REMEDIATION.β Skipped due to learnings
Learnt from: jtschelling Repo: NVIDIA/NVSentinel PR: 455 File: docs/designs/013-remediation-plugins.md:143-155 Timestamp: 2025-12-01T17:52:43.853Z Learning: Linting violations (such as hard tabs, MD010) in code stubs within design documents (docs/designs/) in the NVSentinel repository can be ignored, as these are illustrative examples rather than production code.Learnt from: tanishagoyal2 Repo: NVIDIA/NVSentinel PR: 609 File: tests/data/healthy-event.json:19-20 Timestamp: 2026-01-07T09:54:49.335Z Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.Learnt from: tanishagoyal2 Repo: NVIDIA/NVSentinel PR: 248 File: distros/kubernetes/nvsentinel/charts/health-events-analyzer/values.yaml:87-124 Timestamp: 2025-11-10T10:25:19.443Z Learning: In the NVSentinel health-events-analyzer, the agent filter to exclude analyzer-published events (`"healthevent.agent": {"$ne": "health-events-analyzer"}`) is automatically added as the first stage in getPipelineStages() function in health-events-analyzer/pkg/reconciler/reconciler.go, not in individual rule configurations in values.yaml.
6376148 to
ab8c6d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
π€ Fix all issues with AI agents
In @data-models/protobufs/health_event.proto:
- Line 79: Add a field-level comment for the processingStrategy field on the
HealthEvent message that explains its purpose for routing/filtering and
remediation decisions; state that ProcessingStrategy processingStrategy = 16
indicates how downstream systems should handle the event (e.g., whether to
EXECUTE_REMEDIATION, LOG_ONLY, or IGNORE), document the default behavior when
the field is unset (treated as EXECUTE_REMEDIATION by downstream code), and note
how consumers should interpret it together with other HealthEvent fields such as
isHealthy and isFatal (e.g., certain strategies may be ignored or escalated if
isFatal is true or if isHealthy toggles), so implementers can make consistent
routing/remediation choices.
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
β Files ignored due to path filters (1)
data-models/pkg/protos/health_event.pb.gois excluded by!**/*.pb.go
π Files selected for processing (2)
data-models/protobufs/health_event.protodocs/designs/025-processing-strategy-for-health-checks.md
π§ Files skipped from review as they are similar to previous changes (1)
- docs/designs/025-processing-strategy-for-health-checks.md
π§° Additional context used
π Path-based instructions (1)
data-models/protobufs/**/*.proto
π CodeRabbit inference engine (.github/copilot-instructions.md)
data-models/protobufs/**/*.proto: Define Protocol Buffer messages indata-models/protobufs/directory
Use semantic versioning for breaking changes in Protocol Buffer messages
Include comprehensive comments for all fields in Protocol Buffer messages
Files:
data-models/protobufs/health_event.proto
π§ Learnings (3)
π Common learnings
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
π Learning: 2026-01-07T09:54:49.335Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
Applied to files:
data-models/protobufs/health_event.proto
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to data-models/protobufs/**/*.proto : Include comprehensive comments for all fields in Protocol Buffer messages
Applied to files:
data-models/protobufs/health_event.proto
π Additional comments (1)
data-models/protobufs/health_event.proto (1)
32-40: Well-documented enum definition.The
ProcessingStrategyenum is well-structured with comprehensive documentation that clearly explains:
- The overall purpose of the enum
- The role of each value (UNSPECIFIED, EXECUTE_REMEDIATION, STORE_ONLY)
- The distinction between observability-only and cluster-modifying behaviors
The UNSPECIFIED documentation addresses previous concerns by explicitly stating it's reserved as the zero value and not used in the code.
ab8c6d4 to
765ec61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
π€ Fix all issues with AI agents
In @data-models/protobufs/health_event.proto:
- Line 79: Add a descriptive comment above the processingStrategy field in the
HealthEvent protobuf message: explain that ProcessingStrategy (the enum)
determines how the event is handled (e.g., synchronous vs asynchronous
processing, priority or routing implications), when to set it, and any default
behavior or constraints; update the comment immediately above the line
"ProcessingStrategy processingStrategy = 16;" so the field-level documentation
mirrors the enum documentation and clarifies intended usage.
π§Ή Nitpick comments (1)
docs/designs/025-processing-strategy-for-health-checks.md (1)
349-362: Fix hard tab indentation in code blocks.The markdown linter flagged hard tabs (U+0009) in several code blocks. Replace them with spaces for consistency with standard markdown formatting conventions.
Example fix for lines 349-362
Replace hard tabs with spaces (or adjust your editor to use spaces):
func (p *PlatformConnectorServer) HealthEventOccurredV1(ctx context.Context, - he *pb.HealthEvents) (*empty.Empty, error) { - slog.Info("Health events received", "events", he) + he *pb.HealthEvents) (*empty.Empty, error) { + slog.Info("Health events received", "events", he)Apply similar changes to all flagged code blocks.
Also applies to: 371-382, 435-448, 456-469, 483-496, 504-517, 593-607
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
β Files ignored due to path filters (1)
data-models/pkg/protos/health_event.pb.gois excluded by!**/*.pb.go
π Files selected for processing (6)
data-models/protobufs/health_event.protodocs/designs/025-processing-strategy-for-health-checks.mdplatform-connectors/pkg/server/platform_connector_server.gostore-client/pkg/client/mongodb_pipeline_builder.gostore-client/pkg/client/postgresql_pipeline_builder.gotests/helpers/healthevent.go
π§ Files skipped from review as they are similar to previous changes (1)
- tests/helpers/healthevent.go
π§° Additional context used
π Path-based instructions (2)
data-models/protobufs/**/*.proto
π CodeRabbit inference engine (.github/copilot-instructions.md)
data-models/protobufs/**/*.proto: Define Protocol Buffer messages indata-models/protobufs/directory
Use semantic versioning for breaking changes in Protocol Buffer messages
Include comprehensive comments for all fields in Protocol Buffer messages
Files:
data-models/protobufs/health_event.proto
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
store-client/pkg/client/mongodb_pipeline_builder.gostore-client/pkg/client/postgresql_pipeline_builder.goplatform-connectors/pkg/server/platform_connector_server.go
π§ Learnings (7)
π Common learnings
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
π Learning: 2026-01-07T09:54:49.335Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
Applied to files:
data-models/protobufs/health_event.protoplatform-connectors/pkg/server/platform_connector_server.godocs/designs/025-processing-strategy-for-health-checks.md
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to data-models/protobufs/**/*.proto : Include comprehensive comments for all fields in Protocol Buffer messages
Applied to files:
data-models/protobufs/health_event.proto
π Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.
Applied to files:
store-client/pkg/client/mongodb_pipeline_builder.gostore-client/pkg/client/postgresql_pipeline_builder.go
π Learning: 2025-11-10T10:25:19.443Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 248
File: distros/kubernetes/nvsentinel/charts/health-events-analyzer/values.yaml:87-124
Timestamp: 2025-11-10T10:25:19.443Z
Learning: In the NVSentinel health-events-analyzer, the agent filter to exclude analyzer-published events (`"healthevent.agent": {"$ne": "health-events-analyzer"}`) is automatically added as the first stage in getPipelineStages() function in health-events-analyzer/pkg/reconciler/reconciler.go, not in individual rule configurations in values.yaml.
Applied to files:
platform-connectors/pkg/server/platform_connector_server.godocs/designs/025-processing-strategy-for-health-checks.md
π Learning: 2025-12-23T10:34:13.121Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 610
File: platform-connectors/pkg/connectors/kubernetes/process_node_events.go:346-370
Timestamp: 2025-12-23T10:34:13.121Z
Learning: In platform-connectors/pkg/connectors/kubernetes/process_node_events.go, the Event.Type field is intentionally set to healthEvent.CheckName rather than the standard Kubernetes "Normal" or "Warning" values. This is a deliberate design choice for NVSentinel to apply node events with the checkname as the type.
Applied to files:
docs/designs/025-processing-strategy-for-health-checks.md
π Learning: 2025-11-04T06:31:02.147Z
Learnt from: Gyan172004
Repo: NVIDIA/NVSentinel PR: 223
File: platform-connectors/pkg/nodemetadata/processor.go:0-0
Timestamp: 2025-11-04T06:31:02.147Z
Learning: In platform-connectors/pkg/nodemetadata/processor.go, the NewProcessor function does not perform a nil check on the config parameter because the caller is expected to guarantee a non-nil config is provided.
Applied to files:
docs/designs/025-processing-strategy-for-health-checks.md
𧬠Code graph analysis (1)
store-client/pkg/client/postgresql_pipeline_builder.go (2)
store-client/pkg/datastore/types.go (4)
ToPipeline(161-163)D(131-133)E(126-128)A(136-138)data-models/pkg/protos/health_event.pb.go (1)
ProcessingStrategy_EXECUTE_REMEDIATION(48-48)
πͺ markdownlint-cli2 (0.18.1)
docs/designs/025-processing-strategy-for-health-checks.md
349-349: Hard tabs
Column: 1
(MD010, no-hard-tabs)
350-350: Hard tabs
Column: 1
(MD010, no-hard-tabs)
352-352: Hard tabs
Column: 1
(MD010, no-hard-tabs)
354-354: Hard tabs
Column: 1
(MD010, no-hard-tabs)
355-355: Hard tabs
Column: 1
(MD010, no-hard-tabs)
356-356: Hard tabs
Column: 1
(MD010, no-hard-tabs)
357-357: Hard tabs
Column: 1
(MD010, no-hard-tabs)
358-358: Hard tabs
Column: 1
(MD010, no-hard-tabs)
359-359: Hard tabs
Column: 1
(MD010, no-hard-tabs)
361-361: Hard tabs
Column: 1
(MD010, no-hard-tabs)
371-371: Hard tabs
Column: 1
(MD010, no-hard-tabs)
372-372: Hard tabs
Column: 1
(MD010, no-hard-tabs)
373-373: Hard tabs
Column: 1
(MD010, no-hard-tabs)
382-382: Hard tabs
Column: 1
(MD010, no-hard-tabs)
435-435: Hard tabs
Column: 1
(MD010, no-hard-tabs)
436-436: Hard tabs
Column: 1
(MD010, no-hard-tabs)
437-437: Hard tabs
Column: 1
(MD010, no-hard-tabs)
438-438: Hard tabs
Column: 1
(MD010, no-hard-tabs)
439-439: Hard tabs
Column: 1
(MD010, no-hard-tabs)
440-440: Hard tabs
Column: 1
(MD010, no-hard-tabs)
441-441: Hard tabs
Column: 1
(MD010, no-hard-tabs)
442-442: Hard tabs
Column: 1
(MD010, no-hard-tabs)
443-443: Hard tabs
Column: 1
(MD010, no-hard-tabs)
444-444: Hard tabs
Column: 1
(MD010, no-hard-tabs)
445-445: Hard tabs
Column: 1
(MD010, no-hard-tabs)
446-446: Hard tabs
Column: 1
(MD010, no-hard-tabs)
447-447: Hard tabs
Column: 1
(MD010, no-hard-tabs)
448-448: Hard tabs
Column: 1
(MD010, no-hard-tabs)
456-456: Hard tabs
Column: 1
(MD010, no-hard-tabs)
457-457: Hard tabs
Column: 1
(MD010, no-hard-tabs)
458-458: Hard tabs
Column: 1
(MD010, no-hard-tabs)
459-459: Hard tabs
Column: 1
(MD010, no-hard-tabs)
460-460: Hard tabs
Column: 1
(MD010, no-hard-tabs)
461-461: Hard tabs
Column: 1
(MD010, no-hard-tabs)
462-462: Hard tabs
Column: 1
(MD010, no-hard-tabs)
463-463: Hard tabs
Column: 1
(MD010, no-hard-tabs)
464-464: Hard tabs
Column: 1
(MD010, no-hard-tabs)
465-465: Hard tabs
Column: 1
(MD010, no-hard-tabs)
466-466: Hard tabs
Column: 1
(MD010, no-hard-tabs)
483-483: Hard tabs
Column: 1
(MD010, no-hard-tabs)
484-484: Hard tabs
Column: 1
(MD010, no-hard-tabs)
485-485: Hard tabs
Column: 1
(MD010, no-hard-tabs)
486-486: Hard tabs
Column: 1
(MD010, no-hard-tabs)
487-487: Hard tabs
Column: 1
(MD010, no-hard-tabs)
488-488: Hard tabs
Column: 1
(MD010, no-hard-tabs)
489-489: Hard tabs
Column: 1
(MD010, no-hard-tabs)
490-490: Hard tabs
Column: 1
(MD010, no-hard-tabs)
491-491: Hard tabs
Column: 1
(MD010, no-hard-tabs)
492-492: Hard tabs
Column: 1
(MD010, no-hard-tabs)
493-493: Hard tabs
Column: 1
(MD010, no-hard-tabs)
494-494: Hard tabs
Column: 1
(MD010, no-hard-tabs)
495-495: Hard tabs
Column: 1
(MD010, no-hard-tabs)
496-496: Hard tabs
Column: 1
(MD010, no-hard-tabs)
504-504: Hard tabs
Column: 1
(MD010, no-hard-tabs)
505-505: Hard tabs
Column: 1
(MD010, no-hard-tabs)
506-506: Hard tabs
Column: 1
(MD010, no-hard-tabs)
507-507: Hard tabs
Column: 1
(MD010, no-hard-tabs)
508-508: Hard tabs
Column: 1
(MD010, no-hard-tabs)
509-509: Hard tabs
Column: 1
(MD010, no-hard-tabs)
510-510: Hard tabs
Column: 1
(MD010, no-hard-tabs)
511-511: Hard tabs
Column: 1
(MD010, no-hard-tabs)
512-512: Hard tabs
Column: 1
(MD010, no-hard-tabs)
513-513: Hard tabs
Column: 1
(MD010, no-hard-tabs)
514-514: Hard tabs
Column: 1
(MD010, no-hard-tabs)
593-593: Hard tabs
Column: 1
(MD010, no-hard-tabs)
594-594: Hard tabs
Column: 1
(MD010, no-hard-tabs)
595-595: Hard tabs
Column: 1
(MD010, no-hard-tabs)
596-596: Hard tabs
Column: 1
(MD010, no-hard-tabs)
597-597: Hard tabs
Column: 1
(MD010, no-hard-tabs)
598-598: Hard tabs
Column: 1
(MD010, no-hard-tabs)
599-599: Hard tabs
Column: 1
(MD010, no-hard-tabs)
600-600: Hard tabs
Column: 1
(MD010, no-hard-tabs)
601-601: Hard tabs
Column: 1
(MD010, no-hard-tabs)
602-602: Hard tabs
Column: 1
(MD010, no-hard-tabs)
603-603: Hard tabs
Column: 1
(MD010, no-hard-tabs)
604-604: Hard tabs
Column: 1
(MD010, no-hard-tabs)
π Additional comments (5)
platform-connectors/pkg/server/platform_connector_server.go (1)
57-62: LGTM! Normalization logic correctly defaults UNSPECIFIED to EXECUTE_REMEDIATION.The pre-processing mutation ensures all downstream components receive events with an explicit processingStrategy, providing consistent behavior across modules. The placement before pipeline processing and ring buffer enqueuing is correct.
store-client/pkg/client/postgresql_pipeline_builder.go (2)
119-140: LGTM! Pipeline correctly filters EXECUTE_REMEDIATION events with backward compatibility.The implementation properly uses the $or condition to match both explicitly-set EXECUTE_REMEDIATION and missing processingStrategy fields, ensuring old events (created before this feature) are still processed. Field name "processingstrategy" correctly uses lowercase for database access.
157-180: LGTM! Non-fatal unhealthy pipeline correctly handles processingStrategy filtering.The backward compatibility approach with $or matching is appropriate, and the handling of both INSERT and UPDATE operations is correct for PostgreSQL's trigger-based change detection. The implementation aligns with the MongoDB counterpart.
store-client/pkg/client/mongodb_pipeline_builder.go (2)
87-107: LGTM! Pipeline correctly filters EXECUTE_REMEDIATION events with backward compatibility.The $or condition appropriately matches both EXECUTE_REMEDIATION and missing processingStrategy fields, ensuring historical events are still processed. The use of lowercase "processingstrategy" is correct for database field access, and the INSERT-only operation type is appropriate for MongoDB change streams.
Based on learnings, protobuf fields in MongoDB use lowercase field names.
123-144: LGTM! Non-fatal unhealthy pipeline correctly handles processingStrategy filtering.The implementation properly excludes STORE_ONLY events while maintaining backward compatibility for events without the processingStrategy field. The logic is consistent with the PostgreSQL implementation and appropriate for the health-events-analyzer use case.
Signed-off-by: Tanisha goyal <[email protected]>
765ec61 to
9ef3351
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
π§Ή Nitpick comments (1)
docs/designs/025-processing-strategy-for-health-checks.md (1)
349-604: Consider replacing hard tabs with spaces.The markdown linter detects hard tabs throughout the code blocks. While this doesn't affect functionality, replacing tabs with spaces would improve consistency with markdown best practices.
Optional: Convert tabs to spaces
You can convert hard tabs to spaces using:
#!/bin/bash # Convert hard tabs to spaces in the markdown file fd -e md -x sed -i 's/\t/ /g' {}
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
β Files ignored due to path filters (1)
data-models/pkg/protos/health_event.pb.gois excluded by!**/*.pb.go
π Files selected for processing (6)
data-models/protobufs/health_event.protodocs/designs/025-processing-strategy-for-health-checks.mdplatform-connectors/pkg/server/platform_connector_server.gostore-client/pkg/client/mongodb_pipeline_builder.gostore-client/pkg/client/postgresql_pipeline_builder.gotests/helpers/healthevent.go
π§ Files skipped from review as they are similar to previous changes (2)
- data-models/protobufs/health_event.proto
- platform-connectors/pkg/server/platform_connector_server.go
π§° Additional context used
π Path-based instructions (1)
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
tests/helpers/healthevent.gostore-client/pkg/client/postgresql_pipeline_builder.gostore-client/pkg/client/mongodb_pipeline_builder.go
π§ Learnings (6)
π Common learnings
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
π Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.
Applied to files:
store-client/pkg/client/postgresql_pipeline_builder.gostore-client/pkg/client/mongodb_pipeline_builder.go
π Learning: 2026-01-07T09:54:49.335Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
Applied to files:
docs/designs/025-processing-strategy-for-health-checks.md
π Learning: 2025-12-23T10:34:13.121Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 610
File: platform-connectors/pkg/connectors/kubernetes/process_node_events.go:346-370
Timestamp: 2025-12-23T10:34:13.121Z
Learning: In platform-connectors/pkg/connectors/kubernetes/process_node_events.go, the Event.Type field is intentionally set to healthEvent.CheckName rather than the standard Kubernetes "Normal" or "Warning" values. This is a deliberate design choice for NVSentinel to apply node events with the checkname as the type.
Applied to files:
docs/designs/025-processing-strategy-for-health-checks.md
π Learning: 2025-11-04T06:31:02.147Z
Learnt from: Gyan172004
Repo: NVIDIA/NVSentinel PR: 223
File: platform-connectors/pkg/nodemetadata/processor.go:0-0
Timestamp: 2025-11-04T06:31:02.147Z
Learning: In platform-connectors/pkg/nodemetadata/processor.go, the NewProcessor function does not perform a nil check on the config parameter because the caller is expected to guarantee a non-nil config is provided.
Applied to files:
docs/designs/025-processing-strategy-for-health-checks.md
π Learning: 2025-11-10T10:25:19.443Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 248
File: distros/kubernetes/nvsentinel/charts/health-events-analyzer/values.yaml:87-124
Timestamp: 2025-11-10T10:25:19.443Z
Learning: In the NVSentinel health-events-analyzer, the agent filter to exclude analyzer-published events (`"healthevent.agent": {"$ne": "health-events-analyzer"}`) is automatically added as the first stage in getPipelineStages() function in health-events-analyzer/pkg/reconciler/reconciler.go, not in individual rule configurations in values.yaml.
Applied to files:
docs/designs/025-processing-strategy-for-health-checks.md
𧬠Code graph analysis (3)
tests/helpers/healthevent.go (2)
data-models/pkg/protos/health_event.pb.go (4)
ProcessingStrategy(44-44)ProcessingStrategy(76-78)ProcessingStrategy(80-82)ProcessingStrategy(89-91)health-monitors/gpu-health-monitor/gpu_health_monitor/protos/health_event_pb2.pyi (1)
ProcessingStrategy(14-18)
store-client/pkg/client/postgresql_pipeline_builder.go (2)
store-client/pkg/datastore/types.go (4)
ToPipeline(161-163)D(131-133)E(126-128)A(136-138)data-models/pkg/protos/health_event.pb.go (1)
ProcessingStrategy_EXECUTE_REMEDIATION(48-48)
store-client/pkg/client/mongodb_pipeline_builder.go (2)
store-client/pkg/datastore/types.go (4)
ToPipeline(161-163)D(131-133)E(126-128)A(136-138)data-models/pkg/protos/health_event.pb.go (1)
ProcessingStrategy_EXECUTE_REMEDIATION(48-48)
πͺ markdownlint-cli2 (0.18.1)
docs/designs/025-processing-strategy-for-health-checks.md
349-349: Hard tabs
Column: 1
(MD010, no-hard-tabs)
350-350: Hard tabs
Column: 1
(MD010, no-hard-tabs)
352-352: Hard tabs
Column: 1
(MD010, no-hard-tabs)
354-354: Hard tabs
Column: 1
(MD010, no-hard-tabs)
355-355: Hard tabs
Column: 1
(MD010, no-hard-tabs)
356-356: Hard tabs
Column: 1
(MD010, no-hard-tabs)
357-357: Hard tabs
Column: 1
(MD010, no-hard-tabs)
358-358: Hard tabs
Column: 1
(MD010, no-hard-tabs)
359-359: Hard tabs
Column: 1
(MD010, no-hard-tabs)
361-361: Hard tabs
Column: 1
(MD010, no-hard-tabs)
371-371: Hard tabs
Column: 1
(MD010, no-hard-tabs)
372-372: Hard tabs
Column: 1
(MD010, no-hard-tabs)
373-373: Hard tabs
Column: 1
(MD010, no-hard-tabs)
382-382: Hard tabs
Column: 1
(MD010, no-hard-tabs)
435-435: Hard tabs
Column: 1
(MD010, no-hard-tabs)
436-436: Hard tabs
Column: 1
(MD010, no-hard-tabs)
437-437: Hard tabs
Column: 1
(MD010, no-hard-tabs)
438-438: Hard tabs
Column: 1
(MD010, no-hard-tabs)
439-439: Hard tabs
Column: 1
(MD010, no-hard-tabs)
440-440: Hard tabs
Column: 1
(MD010, no-hard-tabs)
441-441: Hard tabs
Column: 1
(MD010, no-hard-tabs)
442-442: Hard tabs
Column: 1
(MD010, no-hard-tabs)
443-443: Hard tabs
Column: 1
(MD010, no-hard-tabs)
444-444: Hard tabs
Column: 1
(MD010, no-hard-tabs)
445-445: Hard tabs
Column: 1
(MD010, no-hard-tabs)
446-446: Hard tabs
Column: 1
(MD010, no-hard-tabs)
447-447: Hard tabs
Column: 1
(MD010, no-hard-tabs)
448-448: Hard tabs
Column: 1
(MD010, no-hard-tabs)
456-456: Hard tabs
Column: 1
(MD010, no-hard-tabs)
457-457: Hard tabs
Column: 1
(MD010, no-hard-tabs)
458-458: Hard tabs
Column: 1
(MD010, no-hard-tabs)
459-459: Hard tabs
Column: 1
(MD010, no-hard-tabs)
460-460: Hard tabs
Column: 1
(MD010, no-hard-tabs)
461-461: Hard tabs
Column: 1
(MD010, no-hard-tabs)
462-462: Hard tabs
Column: 1
(MD010, no-hard-tabs)
463-463: Hard tabs
Column: 1
(MD010, no-hard-tabs)
464-464: Hard tabs
Column: 1
(MD010, no-hard-tabs)
465-465: Hard tabs
Column: 1
(MD010, no-hard-tabs)
466-466: Hard tabs
Column: 1
(MD010, no-hard-tabs)
483-483: Hard tabs
Column: 1
(MD010, no-hard-tabs)
484-484: Hard tabs
Column: 1
(MD010, no-hard-tabs)
485-485: Hard tabs
Column: 1
(MD010, no-hard-tabs)
486-486: Hard tabs
Column: 1
(MD010, no-hard-tabs)
487-487: Hard tabs
Column: 1
(MD010, no-hard-tabs)
488-488: Hard tabs
Column: 1
(MD010, no-hard-tabs)
489-489: Hard tabs
Column: 1
(MD010, no-hard-tabs)
490-490: Hard tabs
Column: 1
(MD010, no-hard-tabs)
491-491: Hard tabs
Column: 1
(MD010, no-hard-tabs)
492-492: Hard tabs
Column: 1
(MD010, no-hard-tabs)
493-493: Hard tabs
Column: 1
(MD010, no-hard-tabs)
494-494: Hard tabs
Column: 1
(MD010, no-hard-tabs)
495-495: Hard tabs
Column: 1
(MD010, no-hard-tabs)
496-496: Hard tabs
Column: 1
(MD010, no-hard-tabs)
504-504: Hard tabs
Column: 1
(MD010, no-hard-tabs)
505-505: Hard tabs
Column: 1
(MD010, no-hard-tabs)
506-506: Hard tabs
Column: 1
(MD010, no-hard-tabs)
507-507: Hard tabs
Column: 1
(MD010, no-hard-tabs)
508-508: Hard tabs
Column: 1
(MD010, no-hard-tabs)
509-509: Hard tabs
Column: 1
(MD010, no-hard-tabs)
510-510: Hard tabs
Column: 1
(MD010, no-hard-tabs)
511-511: Hard tabs
Column: 1
(MD010, no-hard-tabs)
512-512: Hard tabs
Column: 1
(MD010, no-hard-tabs)
513-513: Hard tabs
Column: 1
(MD010, no-hard-tabs)
514-514: Hard tabs
Column: 1
(MD010, no-hard-tabs)
593-593: Hard tabs
Column: 1
(MD010, no-hard-tabs)
594-594: Hard tabs
Column: 1
(MD010, no-hard-tabs)
595-595: Hard tabs
Column: 1
(MD010, no-hard-tabs)
596-596: Hard tabs
Column: 1
(MD010, no-hard-tabs)
597-597: Hard tabs
Column: 1
(MD010, no-hard-tabs)
598-598: Hard tabs
Column: 1
(MD010, no-hard-tabs)
599-599: Hard tabs
Column: 1
(MD010, no-hard-tabs)
600-600: Hard tabs
Column: 1
(MD010, no-hard-tabs)
601-601: Hard tabs
Column: 1
(MD010, no-hard-tabs)
602-602: Hard tabs
Column: 1
(MD010, no-hard-tabs)
603-603: Hard tabs
Column: 1
(MD010, no-hard-tabs)
604-604: Hard tabs
Column: 1
(MD010, no-hard-tabs)
π Additional comments (7)
tests/helpers/healthevent.go (1)
48-48: LGTM! ProcessingStrategy field and setter follow established patterns.The new field and fluent setter method integrate cleanly with the existing
HealthEventTemplatestructure. TheomitemptyJSON tag ensures backward compatibility with existing tests.Also applies to: 153-156
store-client/pkg/client/postgresql_pipeline_builder.go (2)
119-146: LGTM! Backward-compatible pipeline correctly filters by processing strategy.The pipeline properly uses lowercase
processingstrategyfield name (consistent with protobuf storage conventions) and implements backward compatibility by matching bothEXECUTE_REMEDIATIONand missing fields.
163-192: LGTM! Correctly handles PostgreSQL-specific INSERT/UPDATE semantics.The pipeline appropriately watches both
insertandupdateoperations to handle PostgreSQL's changelog trigger behavior, consistent with the existingBuildNonFatalUnhealthyInsertsPipelinemethod. The backward compatibility logic forprocessingStrategyis correctly implemented.store-client/pkg/client/mongodb_pipeline_builder.go (2)
87-113: LGTM! MongoDB-specific pipeline correctly filters processable events.The pipeline properly handles MongoDB change stream semantics (INSERT operations only) and implements backward compatibility by matching both
EXECUTE_REMEDIATIONand missingprocessingStrategyfields. Field naming follows protobuf storage conventions.
129-156: LGTM! Consistent with MongoDB change stream behavior.The pipeline correctly watches only
insertoperations (appropriate for MongoDB, unlike PostgreSQL which needs insert+update). The backward compatibility logic for legacy events withoutprocessingStrategyis properly implemented.docs/designs/025-processing-strategy-for-health-checks.md (2)
341-363: Excellent documentation of normalization strategy.The explicit documentation of how
UNSPECIFIEDprocessingStrategy defaults toEXECUTE_REMEDIATIONat the gRPC server level is crucial for understanding the system's behavior. This ensures consistent handling across all modules and provides clear backward compatibility guarantees.
581-609: Clear backward compatibility explanation.The documentation clearly explains why the
$orpattern is needed for historical events and how the analyzer handles events withoutprocessingStrategy. This will help future maintainers understand the design rationale.
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
Signed-off-by: Tanisha goyal <[email protected]>
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
Signed-off-by: Tanisha goyal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
π§Ή Nitpick comments (2)
tests/helpers/fault_quarantine.go (1)
141-145: LGTM! Conditional ConfigMap application improves test flexibility.The change correctly makes ConfigMap application optional when
configMapPathis empty, allowing tests to use default configurations or modify settings only viaopts. The backup is still properly taken for restore operations, and error handling remains appropriate.π Optional: Document the empty string behavior
Consider updating the function comment (lines 124-126) to document that
configMapPathcan be an empty string to skip ConfigMap application:// SetupQuarantineTestWithOptions sets up a quarantine test with additional configuration options. // This allows combining multiple deployment modifications into a single rollout. +// Pass an empty configMapPath to skip ConfigMap application and use the existing configuration. // Returns (context, testContext, originalDeployment) - originalDeployment is nil if no deployment changes were made.tests/fault_quarantine_test.go (1)
327-336: Consider sending healthy events for all checks used in the test.The Teardown only sends a healthy event for
SysLogsXIDErrorbut not forGpuPowerWatch. WhileTeardownQuarantineTestmay handle full cleanup, explicitly sending healthy events for both checks would ensure complete state cleanup and make the teardown more robust.β»οΈ Proposed enhancement for complete cleanup
feature.Teardown(func(ctx context.Context, t *testing.T, c *envconf.Config) context.Context { + // Clear SysLogsXIDError event := helpers.NewHealthEvent(testCtx.NodeName). WithErrorCode("79"). WithHealthy(true). WithAgent(helpers.SYSLOG_HEALTH_MONITOR_AGENT). WithCheckName("SysLogsXIDError") helpers.SendHealthEvent(ctx, t, event) + + // Clear GpuPowerWatch + event = helpers.NewHealthEvent(testCtx.NodeName). + WithErrorCode("DCGM_FR_CLOCK_THROTTLE_POWER"). + WithHealthy(true). + WithCheckName("GpuPowerWatch") + helpers.SendHealthEvent(ctx, t, event) return helpers.TeardownQuarantineTest(ctx, t, c) })
π Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
π Files selected for processing (3)
store-client/pkg/client/pipeline_builder.gotests/fault_quarantine_test.gotests/helpers/fault_quarantine.go
π§ Files skipped from review as they are similar to previous changes (1)
- store-client/pkg/client/pipeline_builder.go
π§° Additional context used
π Path-based instructions (2)
**/*.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
tests/helpers/fault_quarantine.gotests/fault_quarantine_test.go
**/*_test.go
π CodeRabbit inference engine (.github/copilot-instructions.md)
**/*_test.go: Useenvtestfor testing Kubernetes controllers instead of fake clients
Usetestify/assertandtestify/requirefor assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format:TestFunctionName_Scenario_ExpectedBehavior
Files:
tests/fault_quarantine_test.go
π§ Learnings (8)
π Common learnings
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
π Learning: 2025-11-06T16:18:09.952Z
Learnt from: KaivalyaMDabhadkar
Repo: NVIDIA/NVSentinel PR: 280
File: tests/fault_management_test.go:334-345
Timestamp: 2025-11-06T16:18:09.952Z
Learning: In the fault-quarantine component, the "quarantinedNodeUncordonedManually" annotation is set to the string literal "True" (with uppercase T), defined as the constant QuarantinedNodeUncordonedManuallyAnnotationValue in fault-quarantine/pkg/common/common.go. Tests should compare against "True", not "true".
Applied to files:
tests/helpers/fault_quarantine.gotests/fault_quarantine_test.go
π Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-11-27T15:42:48.142Z
Learnt from: lalitadithya
Repo: NVIDIA/NVSentinel PR: 459
File: docs/cancelling-breakfix.md:58-90
Timestamp: 2025-11-27T15:42:48.142Z
Learning: The label `k8saas.nvidia.com/ManagedByNVSentinel` exists in NVSentinel and is used to opt nodes out of automated break-fix workflows. The prefix `k8saas.nvidia.com/` is configurable via the `labelPrefix` setting in the fault-quarantine Helm values, and the documentation correctly uses the default value.
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2026-01-07T09:54:49.335Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: tests/data/healthy-event.json:19-20
Timestamp: 2026-01-07T09:54:49.335Z
Learning: In NVSentinel, healthy events (isHealthy: true) can legitimately use processingStrategy: 1 (EXECUTE_REMEDIATION) when the Fault Quarantine Manager needs to act on them to clear previous fault states or update cluster resources to reflect the healthy status.
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-12-23T10:34:13.121Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 610
File: platform-connectors/pkg/connectors/kubernetes/process_node_events.go:346-370
Timestamp: 2025-12-23T10:34:13.121Z
Learning: In platform-connectors/pkg/connectors/kubernetes/process_node_events.go, the Event.Type field is intentionally set to healthEvent.CheckName rather than the standard Kubernetes "Normal" or "Warning" values. This is a deliberate design choice for NVSentinel to apply node events with the checkname as the type.
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Use `envtest` for testing Kubernetes controllers instead of fake clients
Applied to files:
tests/fault_quarantine_test.go
π Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*_test.go : Write table-driven tests when testing multiple scenarios in Go
Applied to files:
tests/fault_quarantine_test.go
𧬠Code graph analysis (1)
tests/helpers/fault_quarantine.go (1)
tests/helpers/kube.go (1)
NVSentinelNamespace(64-64)
π Additional comments (3)
tests/fault_quarantine_test.go (3)
26-26: LGTM!The protos import is necessary for accessing the
ProcessingStrategyenum values used in the test.
246-282: LGTM!The test correctly verifies that
STORE_ONLYevents do not trigger remediation actions (no node conditions, events, cordoning, or annotations). Testing both fatal and non-fatal event types provides good coverage.
284-325: LGTM!The test correctly verifies that
EXECUTE_REMEDIATIONevents trigger remediation actions (node conditions, events, cordoning, and annotations). The parallel structure with the first Assess block provides clear contrast between the two processing strategies.
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
Summary
Type of Change
Component(s) Affected
Testing
Checklist
Issue: #390
Testing
Platform connector logs
Summary by CodeRabbit
Release Notes
New Features
Tests
Documentation
βοΈ Tip: You can customize this high-level summary in your review settings.