-
Notifications
You must be signed in to change notification settings - Fork 596
agentgateway: report nacks via events #12770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jmcguire98
wants to merge
47
commits into
kgateway-dev:main
Choose a base branch
from
jmcguire98:report-agw-nacks-on-gateways
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 17 commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
46f79a1
wip
jmcguire98 da48459
draft impl
jmcguire98 4f49dff
start wiring up status reporting
jmcguire98 b4185f4
finish rough implementation
jmcguire98 4aa4cbe
update role to allow watching events
jmcguire98 f2867c0
register events with kclient and create them in the system ns so we c…
jmcguire98 e873205
fix nack publisher config order
jmcguire98 f569102
fix event creation
jmcguire98 62e7768
sort events on processing so acks recoveries work right
jmcguire98 375fc6b
various cleanups
jmcguire98 9f53605
replace status more correctly
jmcguire98 8662168
fix key in ack/nack map
jmcguire98 11fdf86
cleanup event registration, undo setup changes no longer needed
jmcguire98 5f1d214
cleanup no longer needed changes to startup
jmcguire98 1e25af9
comment and unused var cleanups
jmcguire98 2ce09bd
gen
jmcguire98 b566384
lints
jmcguire98 252fa09
remove use of systemnamespace since we no longer need it
jmcguire98 d93746e
delete gateways from nackStateStore on removeNack when they have no r…
jmcguire98 f76fe63
use eventRecorder to publish events in nackPublisher
jmcguire98 72c5f95
resolve context todo
jmcguire98 47ec6c9
address shouldRespondDelta todo
jmcguire98 bf06891
Merge branch 'main' into report-agw-nacks-on-gateways
jmcguire98 52f1bd6
regen
jmcguire98 0a54e70
remove unnecessary context use from publisher
jmcguire98 fa361f3
rename events.go -> eventmeta.go
jmcguire98 da1e355
unexport newPublisher and let NewNackHandler be responsible for creat…
jmcguire98 b713680
Merge branch 'main' into report-agw-nacks-on-gateways
jmcguire98 5bacfb7
add unit testing for nack handler
jmcguire98 12d35b0
add publisher unit tests
jmcguire98 cecd241
Merge branch 'main' into report-agw-nacks-on-gateways
jmcguire98 1e1dcdb
fix fmt
jmcguire98 e24a5de
lint
jmcguire98 ebc43f0
cleanup godoc comments, unexport some constants
jmcguire98 a98be59
Merge branch 'main' into report-agw-nacks-on-gateways
jmcguire98 631e0a6
clean up things we are no longer planning to do (ack handling, status…
jmcguire98 e96019d
publish messages with the deployment as the involved object as well
jmcguire98 f77d524
cleanup
jmcguire98 d0e1506
pare back event permissions since we don't need an event collection a…
jmcguire98 2e00fba
Merge branch 'main' into report-agw-nacks-on-gateways
jmcguire98 c3eb680
don't try to append resource names to event message for now
jmcguire98 261a216
Merge branch 'main' into report-agw-nacks-on-gateways
jmcguire98 538268a
fix merge conflict formatting
jmcguire98 16fcf34
plumb through context and get gtw and dep so we can set UID on the ev…
jmcguire98 b2f92e0
Merge branch 'main' into report-agw-nacks-on-gateways
jmcguire98 0dd472e
log k8s get errors and return early
jmcguire98 b3dacc9
fix unit tests by adding resources to teh fake client
jmcguire98 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,7 +34,10 @@ rules: | |
| - events | ||
| verbs: | ||
| - create | ||
| - get | ||
| - list | ||
| - patch | ||
| - watch | ||
| - apiGroups: | ||
| - "" | ||
| resources: | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| package nack | ||
|
|
||
| import ( | ||
| "crypto/sha256" | ||
| "encoding/hex" | ||
| "strings" | ||
| ) | ||
|
|
||
| // Event reasons for Kubernetes Events created by agentgateway NACK/ACK detection | ||
| const ( | ||
| ReasonNack = "AgentgatewayNACK" | ||
| ReasonAck = "AgentgatewayACK" | ||
| ) | ||
|
|
||
| // Annotation keys used for NACK events | ||
| const ( | ||
| // AnnotationNackID is a stable hash identifying a specific NACK event | ||
| AnnotationNackID = "kgateway.dev/nack-id" | ||
|
|
||
| // AnnotationTypeURL is the xDS TypeURL that was rejected | ||
| AnnotationTypeURL = "kgateway.dev/type-url" | ||
|
|
||
| // AnnotationObservedAt is the RFC3339 timestamp when the NACK was last observed | ||
| AnnotationObservedAt = "kgateway.dev/observed-at" | ||
|
|
||
| // AnnotationRecoveryOf points to the NACK ID that this ACK event resolves | ||
| AnnotationRecoveryOf = "kgateway.dev/recovery-of" | ||
| ) | ||
|
|
||
| // ComputeNackID creates a stable, identifier for a NACK event. | ||
| // The same combination of inputs will always produce the same NACK ID. | ||
| func ComputeNackID(gatewayNamespacedName, typeURL string) string { | ||
| h := sha256.New() | ||
|
|
||
| input := strings.Join([]string{gatewayNamespacedName, typeURL}, "|") | ||
|
|
||
| h.Write([]byte(input)) | ||
|
|
||
| // Return first 16 hex characters (64 bits) for readability | ||
| return hex.EncodeToString(h.Sum(nil))[:16] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| package nack | ||
|
|
||
| import ( | ||
| "fmt" | ||
| "sync" | ||
| "time" | ||
|
|
||
| corev1 "k8s.io/api/core/v1" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/apimachinery/pkg/types" | ||
| gwv1 "sigs.k8s.io/gateway-api/apis/v1" | ||
|
|
||
| "github.com/kgateway-dev/kgateway/v2/pkg/logging" | ||
| ) | ||
|
|
||
| var nackHandlerLog = logging.New("nack/handler") | ||
|
|
||
| type NackHandler struct { | ||
| nackStateStore map[types.NamespacedName]map[string]string | ||
| nackPublisher *Publisher | ||
| mu sync.RWMutex | ||
| } | ||
|
|
||
| // NackEvent represents a NACK received from an agentgateway gateway | ||
| type NackEvent struct { | ||
| Gateway types.NamespacedName | ||
| TypeUrl string | ||
| ErrorMsg string | ||
| Timestamp time.Time | ||
| } | ||
|
|
||
| // AckEvent represents a successful ACK received from an agentgateway gateway | ||
| type AckEvent struct { | ||
| Gateway types.NamespacedName | ||
| TypeUrl string | ||
| Timestamp time.Time | ||
| } | ||
|
|
||
| func NewNackHandler(nackPublisher *Publisher) *NackHandler { | ||
| return &NackHandler{ | ||
| nackStateStore: make(map[types.NamespacedName]map[string]string), | ||
| nackPublisher: nackPublisher, | ||
| mu: sync.RWMutex{}, | ||
| } | ||
| } | ||
|
|
||
| // HandleNack publishes a NACK event to the Kubernetes Event API. | ||
| func (h *NackHandler) HandleNack(nackEvent *NackEvent) { | ||
| h.nackPublisher.onNack(*nackEvent) | ||
| } | ||
|
|
||
| // HandleAck publishes an ACK event to the Kubernetes Event API. | ||
| func (h *NackHandler) HandleAck(ackEvent *AckEvent) { | ||
| h.nackPublisher.onAck(*ackEvent) | ||
| } | ||
|
|
||
| // FilterEventsAndConvertToNackStatusUpdate filters Kubernetes Events to only process NACK/ACK events for Gateways. | ||
| // Returns nil for events that should be ignored, or a NackStatusUpdate for events relevant events. | ||
| func (h *NackHandler) FilterEventsAndUpdateState(event *corev1.Event) error { | ||
| nackHandlerLog.Debug("processing event", "reason", event.Reason, "kind", event.InvolvedObject.Kind, "name", event.InvolvedObject.Name, "namespace", event.InvolvedObject.Namespace) | ||
|
|
||
| if event.Reason != ReasonNack && event.Reason != ReasonAck { | ||
| nackHandlerLog.Debug("ignoring event - not a NACK/ACK event", "reason", event.Reason) | ||
| return nil | ||
| } | ||
|
|
||
| nackID, exists := event.Annotations[AnnotationNackID] | ||
| if !exists || nackID == "" { | ||
| return fmt.Errorf("event missing NACK ID annotation: %v", event.Annotations) | ||
| } | ||
|
|
||
| // Handle recovery events (ACKs that reference a previous NACK) | ||
| isRecovery := event.Reason == ReasonAck | ||
| if isRecovery { | ||
| recoveryOf, hasRecovery := event.Annotations[AnnotationRecoveryOf] | ||
| if !hasRecovery || recoveryOf == "" { | ||
| return fmt.Errorf("ACK event missing recovery annotation: %v", event.Annotations) | ||
| } | ||
| h.removeNack(types.NamespacedName{Name: event.InvolvedObject.Name, Namespace: event.InvolvedObject.Namespace}, recoveryOf) | ||
| return nil | ||
| } | ||
| h.addNack(types.NamespacedName{Name: event.InvolvedObject.Name, Namespace: event.InvolvedObject.Namespace}, nackID, event.Message) | ||
| return nil | ||
| } | ||
|
|
||
| // ComputeStatus computes the Gateway status condition based on the current set of active NACKs for a gateway. | ||
| // - No active NACKs: No status returned | ||
| // - One active NACK: Programmed=False with specific error message | ||
| // - Multiple active NACKs: Programmed=False with aggregated error count | ||
| func (h *NackHandler) ComputeStatus(gateway types.NamespacedName) *metav1.Condition { | ||
| h.mu.RLock() | ||
| defer h.mu.RUnlock() | ||
| activeNacks := h.nackStateStore[gateway] | ||
| if len(activeNacks) == 0 { | ||
| // if there are no active NACKs, return nil (let the caller decide what the status should be) | ||
| return nil | ||
| } else { | ||
| var message string | ||
| if len(activeNacks) == 1 { | ||
| for _, msg := range activeNacks { | ||
| message = fmt.Sprintf("Configuration rejected: %s", msg) | ||
| break | ||
| } | ||
| } else { | ||
| message = fmt.Sprintf("Configuration rejected: %d errors found", len(activeNacks)) | ||
| } | ||
|
|
||
| return &metav1.Condition{ | ||
| Type: string(gwv1.GatewayConditionProgrammed), | ||
| Status: metav1.ConditionFalse, | ||
| Reason: string(gwv1.GatewayReasonInvalid), | ||
| Message: message, | ||
| LastTransitionTime: metav1.Now(), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // addNack adds a NACK to the Gateway's active NACK set when a NACK event is received via the Kubernetes Event API. | ||
| func (h *NackHandler) addNack(gateway types.NamespacedName, nackID, message string) { | ||
| h.mu.Lock() | ||
| defer h.mu.Unlock() | ||
| if h.nackStateStore[gateway] == nil { | ||
| h.nackStateStore[gateway] = make(map[string]string) | ||
| } | ||
| h.nackStateStore[gateway][nackID] = message | ||
| } | ||
|
|
||
| // removeNack removes a NACK from the Gateway's active set when an ACK event is received via the Kubernetes Event API. | ||
| func (h *NackHandler) removeNack(gateway types.NamespacedName, nackID string) { | ||
| h.mu.Lock() | ||
| defer h.mu.Unlock() | ||
| if h.nackStateStore[gateway] == nil { | ||
| return | ||
| } | ||
| delete(h.nackStateStore[gateway], nackID) | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| package nack | ||
|
|
||
| import ( | ||
| "context" | ||
| "time" | ||
|
|
||
| "istio.io/istio/pkg/kube" | ||
| corev1 "k8s.io/api/core/v1" | ||
| "k8s.io/apimachinery/pkg/api/errors" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
|
|
||
| "github.com/kgateway-dev/kgateway/v2/internal/kgateway/wellknown" | ||
| "github.com/kgateway-dev/kgateway/v2/pkg/logging" | ||
| ) | ||
|
|
||
| var log = logging.New("nack/publisher") | ||
|
|
||
| // Publisher converts NACK events from the agentgateway xDS server into Kubernetes Events. | ||
| type Publisher struct { | ||
| ctx context.Context | ||
| client kube.Client | ||
| systemNamespace string | ||
| } | ||
|
|
||
| // NewPublisher creates a new NACK event publisher that will publish k8s events | ||
| func NewPublisher(ctx context.Context, client kube.Client, systemNamespace string) *Publisher { | ||
| return &Publisher{ | ||
| client: client, | ||
| ctx: ctx, | ||
| systemNamespace: systemNamespace, | ||
jmcguire98 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| } | ||
| } | ||
|
|
||
| // onNack publishes a NACK event as a k8s event. | ||
| func (p *Publisher) onNack(event NackEvent) { | ||
jmcguire98 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| nackID := ComputeNackID(event.Gateway.Namespace+"/"+event.Gateway.Name, event.TypeUrl) | ||
|
|
||
| k8sEvent := &corev1.Event{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| GenerateName: "agentgateway-nack-", | ||
| Namespace: event.Gateway.Namespace, | ||
| Annotations: map[string]string{ | ||
| AnnotationNackID: nackID, | ||
| AnnotationTypeURL: event.TypeUrl, | ||
| AnnotationObservedAt: event.Timestamp.Format(time.RFC3339), | ||
| }, | ||
| }, | ||
| InvolvedObject: corev1.ObjectReference{ | ||
| Kind: wellknown.GatewayKind, | ||
| APIVersion: wellknown.GatewayGVK.GroupVersion().String(), | ||
| Name: event.Gateway.Name, | ||
| Namespace: event.Gateway.Namespace, | ||
| }, | ||
| Reason: ReasonNack, | ||
| Message: event.ErrorMsg, | ||
| Type: corev1.EventTypeWarning, | ||
| LastTimestamp: metav1.NewTime(event.Timestamp), | ||
| Count: 1, | ||
| ReportingController: wellknown.DefaultAgwControllerName, | ||
| } | ||
|
|
||
| _, err := p.client.Kube().CoreV1().Events(event.Gateway.Namespace).Create( | ||
| p.ctx, k8sEvent, metav1.CreateOptions{}, | ||
| ) | ||
| if err != nil && !errors.IsAlreadyExists(err) { | ||
| log.Error("failed to publish NACK event for Gateway", "gateway", event.Gateway, "error", err) | ||
| return | ||
| } | ||
|
|
||
| log.Debug("published NACK event for Gateway", "gateway", event.Gateway, "nackID", nackID, "typeURL", event.TypeUrl) | ||
| } | ||
|
|
||
| // onAck publishes an ACK event as a k8s event. | ||
| func (p *Publisher) onAck(event AckEvent) { | ||
| recoveredNackID := ComputeNackID(event.Gateway.Namespace+"/"+event.Gateway.Name, event.TypeUrl) | ||
jmcguire98 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| k8sEvent := &corev1.Event{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| GenerateName: "agentgateway-ack-", | ||
| Namespace: event.Gateway.Namespace, | ||
| Annotations: map[string]string{ | ||
| AnnotationNackID: ComputeNackID(event.Gateway.Namespace+"/"+event.Gateway.Name, event.TypeUrl), | ||
| AnnotationTypeURL: event.TypeUrl, | ||
| AnnotationRecoveryOf: recoveredNackID, | ||
| AnnotationObservedAt: event.Timestamp.Format(time.RFC3339), | ||
| }, | ||
| }, | ||
| InvolvedObject: corev1.ObjectReference{ | ||
| Kind: wellknown.GatewayKind, | ||
| APIVersion: wellknown.GatewayGVK.GroupVersion().String(), | ||
| Name: event.Gateway.Name, | ||
| Namespace: event.Gateway.Namespace, | ||
| }, | ||
| Reason: ReasonAck, | ||
| Message: "Configuration accepted successfully", | ||
| Type: corev1.EventTypeNormal, | ||
| LastTimestamp: metav1.NewTime(event.Timestamp), | ||
| Count: 1, | ||
| ReportingController: wellknown.DefaultAgwControllerName, | ||
| } | ||
|
|
||
| _, err := p.client.Kube().CoreV1().Events(event.Gateway.Namespace).Create( | ||
jmcguire98 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| p.ctx, k8sEvent, metav1.CreateOptions{}, | ||
| ) | ||
| if err != nil && !errors.IsAlreadyExists(err) { | ||
| log.Error("failed to publish ACK event for Gateway", "gateway", event.Gateway, "error", err) | ||
| return | ||
| } | ||
|
|
||
| log.Debug("published ACK event for Gateway", "gateway", event.Gateway, "typeURL", event.TypeUrl) | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.