-
Notifications
You must be signed in to change notification settings - Fork 274
Description
Description
Observed Behavior:
When multiple ResourceGraphDefinitions are Active and reconciling and a new ResourceGraphDefinition is applied that is misconfigured and causing the kro controller to fail,
all new changes to every ResourceGraphDefinition will be blocked. This means that only the "old state" of the ResourceGraphDefinitions will be applied.
Note: As a collateral issue to this, only deleting the problematic ResourceGraphDefinition and restarting the kro controller pod returns us to a healthy setup.
Expected Behavior:
One unhealthy/failing ResourceGraphDefinition should not prevent other healthy ResourceGraphDefinitions from being updated and reconciling. Also I would expect a failing state to be propagated to the ResourceGraphDefinition's status.
Reproduction Steps (Please find the ResourceGraphDefinitions and Instances files used in the bottom Appendix section):
- Apply the HealthyRgd
ResourceGraphDefinitionandClusterRoleto a cluster and verify it'sActive:
kubectl apply -f clusterrole.yaml
kubectl apply -f healthy-rgd.yaml
kubectl get rgd
NAME APIVERSION KIND STATE AGE
healthy-rgd-test v1 HealthyRgd Active 30s- Apply the
HealthyRgdresource, verify it has reconciled and also that the underlying resource has been created (aNamespacein this example):
kubectl apply -f healthy-rgd-resource.yaml
kubectl get healthyrgd -n test-namespace
NAME STATE READY AGE
healthy-rgd-resource ACTIVE True 18s
kubectl get namespace healthy-rgd-resource-generated
NAME STATUS AGE
healthy-rgd-resource-generated Active 30s- Apply the UnhealthyRgd
ResourceGraphDefinition,UnhealthyRgdhas no state (which I would think is expected):
kubectl apply -f unhealthy-rgd.yaml
resourcegraphdefinition.kro.run/unhealthy-rgd-test created
kubectl get rgd
NAME APIVERSION KIND STATE AGE
healthy-rgd-test v1 HealthyRgd Active 69m
unhealthy-rgd-test v1 UnhealthyRgd 40skro-systempod logs will show the issue (expect since aClusterRolewas not setup for thisResourceGraphDefinition):
2025-12-05T09:18:39Z ERROR dynamic-controller watch error for lazy informer {"gvr": "unhealthy.rgd.com/v1, Resource=unhealthyrgds", "gvr": "unhealthy.rgd.com/v1, Resource=unhealthyrgds", "error": "failed to list *v1.PartialObjectMetadata: unhealthyrgds.unhealthy.rgd.com is forbidden: User \"system:serviceaccount:kro-system:kro\" cannot list resource \"unhealthyrgds\" in API group \"unhealthy.rgd.com\" at the cluster scope"}
github.com/kubernetes-sigs/kro/pkg/dynamiccontroller/internal.(*LazyInformer).ensureInformer.func1
github.com/kubernetes-sigs/kro/pkg/dynamiccontroller/internal/gvr_watch.go:86
k8s.io/client-go/tools/cache.(*sharedIndexInformer).SetWatchErrorHandler.func1
k8s.io/[email protected]/tools/cache/shared_informer.go:497
k8s.io/client-go/tools/cache.(*Reflector).RunWithContext.func1
k8s.io/[email protected]/tools/cache/reflector.go:361
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
k8s.io/[email protected]/pkg/util/wait/backoff.go:233
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext.func1
k8s.io/[email protected]/pkg/util/wait/backoff.go:255
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext
k8s.io/[email protected]/pkg/util/wait/backoff.go:256
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
k8s.io/[email protected]/pkg/util/wait/backoff.go:233
k8s.io/client-go/tools/cache.(*Reflector).RunWithContext
k8s.io/[email protected]/tools/cache/reflector.go:359
k8s.io/client-go/tools/cache.(*controller).RunWithContext.(*Group).StartWithContext.func3
k8s.io/[email protected]/pkg/util/wait/wait.go:63
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
k8s.io/[email protected]/pkg/util/wait/wait.go:72- Verify creating a new
HealthyRgdresource (e.g. namedhealthy-rgd-test-2) - Works
kubectl apply -f healthy-rgd-resource-2.yaml
healthyrgd.healthy.rgd.com/healthy-rgd-resource-2 created
kubectl get healthyrgd -n test-namespace
NAME STATE READY AGE
healthy-rgd-resource-2 ACTIVE True 77s
kubectl get namespace healthy-rgd-resource-2-test
NAME STATUS AGE
healthy-rgd-resource-2-test Active 2m5s- Modify the HealthyRgd
ResourceGraphDefinitionto add a new field (e.g.newShinyField) and verify it is updated:
...
spec:
schema:
apiVersion: v1
group: healthy.rgd.com
kind: HealthyRgd
spec:
...
newShinyField: string | required=true description="A new shiny field added to test RGD updates"
...
kubectl apply -f healthy-rgd.yaml
resourcegraphdefinition.kro.run/healthy-rgd-test configured
kubectl get rgd healthy-rgd-test -oyaml
...
(newShinyField is present in the spec)
...- Update the
HealthyRgdresource with the new field (newShinyField) - Fails
kubectl apply -f healthy-rgd-resource.yaml
The request is invalid: patch: Invalid value: "...": strict decoding error: unknown field "spec.newShinyField"- Delete the UnhealthyRgd
ResourceGraphDefinition:
kubectl delete rgd unhealthy-rgd-test
resourcegraphdefinition.kro.run "unhealthy-rgd-test" deletedNote: You can attempt to re-apply the HealthyRgd resource but that will fail with the same previous decoding error
- Delete the
kro-systempod (the only way I have found around this broken state):
kubectl delete pod <pod-name> -n kro-system- Re-apply the
HealthyRgdresource with the new field (newShinyField):
kubectl apply -f healthy-rgd-resource.yaml
healthyrgd.healthy.rgd.com/healthy-rgd-resource configuredEverything is back to normal i.e. reconciliation is successful again.
Versions:
- kro version:
0.7.0 - Kubernetes Version (
kubectl version):v1.33.5
Involved Controllers:
- Controller URLs and Versions (if applicable): kro
dynamic-controller
Error Logs (if applicable)**: Shared in the above reproduction steps
Appendix
ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
rbac.kro.run/aggregate-to-controller: 'true'
name: kro:controller:healthy-rgd-test
rules:
- apiGroups:
- healthy.rgd.com
resources:
- healthyrgds
- healthyrgds/status
verbs:
- '*'
- apiGroups:
- ''
resources:
- namespaces
verbs:
- '*'Healthy ResourceGraphDefinition
apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
name: healthy-rgd-test
spec:
schema:
apiVersion: v1
group: healthy.rgd.com
kind: HealthyRgd
spec:
expectedState: string | required=true description="The expected state of the resource"
resources:
- id: ns
template:
apiVersion: v1
kind: Namespace
metadata:
name: ${schema.metadata.name}-testUnhealthy ResourceGraphDefinition
apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
name: unhealthy-rgd-test
spec:
schema:
apiVersion: v1
group: unhealthy.rgd.com
kind: UnhealthyRgd
spec:
expectedState: string | required=true description="The expected state of the resource"
resources:
- id: ns
template:
apiVersion: v1
kind: Namespace
metadata:
name: ${schema.metadata.name}-generatedHealthyRgd resource
apiVersion: healthy.rgd.com/v1
kind: HealthyRgd
metadata:
name: healthy-rgd-resource
namespace: test-namespace
spec:
expectedState: "healthy"- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Which option describes the most your issue?
ResourceGraphDefinition (Create, Update, Deletion)