Unhealthy ResourceGraphDefinition causes stale state for all other ResourceGraphDefinitions

### Description

**Observed Behavior**:
When multiple `ResourceGraphDefinition`s are `Active` and reconciling and a new `ResourceGraphDefinition` is applied that is misconfigured and causing the kro controller to fail,
all new changes to every `ResourceGraphDefinition` will be blocked. This means that only the "old state" of the `ResourceGraphDefinition`s will be applied.

**Note:** As a collateral issue to this, only deleting the problematic `ResourceGraphDefinition` and restarting the kro controller pod returns us to a healthy setup.

**Expected Behavior**:
One unhealthy/failing `ResourceGraphDefinition` should not prevent other healthy `ResourceGraphDefinition`s from being updated and reconciling. Also I would expect a failing state to be propagated to the `ResourceGraphDefinition`'s status.

**Reproduction Steps** (Please find the `ResourceGraphDefinition`s and `Instances` files used in the bottom Appendix section):
1. Apply the [HealthyRgd `ResourceGraphDefinition`](#healthy-resourcegraphdefinition) and [`ClusterRole`](#clusterrole) to a cluster and verify it's `Active`:

```shell
kubectl apply -f clusterrole.yaml
kubectl apply -f healthy-rgd.yaml

kubectl get rgd
NAME                                         APIVERSION   KIND                     STATE    AGE
healthy-rgd-test                             v1           HealthyRgd               Active   30s
```

2. Apply the [`HealthyRgd` resource](#healthyrgd-resource), verify it has reconciled and also that the underlying resource has been created (a `Namespace` in this example):

```shell
kubectl apply -f healthy-rgd-resource.yaml

kubectl get healthyrgd -n test-namespace
NAME                   STATE    READY   AGE
healthy-rgd-resource   ACTIVE   True    18s

kubectl get namespace healthy-rgd-resource-generated
NAME                                STATUS   AGE
healthy-rgd-resource-generated      Active   30s
```

3. Apply the [UnhealthyRgd `ResourceGraphDefinition`](#unhealthy-resourcegraphdefinition), `UnhealthyRgd` has no state (which I would think is expected):

```shell
kubectl apply -f unhealthy-rgd.yaml 
resourcegraphdefinition.kro.run/unhealthy-rgd-test created

kubectl get rgd
NAME                                         APIVERSION   KIND                     STATE    AGE
healthy-rgd-test                             v1           HealthyRgd               Active   69m
unhealthy-rgd-test                           v1           UnhealthyRgd                      40s
```

4. `kro-system` pod logs will show the issue (expect since a `ClusterRole` was not setup for this `ResourceGraphDefinition`):

```shell
2025-12-05T09:18:39Z    ERROR   dynamic-controller      watch error for lazy informer   {"gvr": "unhealthy.rgd.com/v1, Resource=unhealthyrgds", "gvr": "unhealthy.rgd.com/v1, Resource=unhealthyrgds", "error": "failed to list *v1.PartialObjectMetadata: unhealthyrgds.unhealthy.rgd.com is forbidden: User \"system:serviceaccount:kro-system:kro\" cannot list resource \"unhealthyrgds\" in API group \"unhealthy.rgd.com\" at the cluster scope"}
github.com/kubernetes-sigs/kro/pkg/dynamiccontroller/internal.(*LazyInformer).ensureInformer.func1
        github.com/kubernetes-sigs/kro/pkg/dynamiccontroller/internal/gvr_watch.go:86
k8s.io/client-go/tools/cache.(*sharedIndexInformer).SetWatchErrorHandler.func1
        k8s.io/client-go@v0.34.1/tools/cache/shared_informer.go:497
k8s.io/client-go/tools/cache.(*Reflector).RunWithContext.func1
        k8s.io/client-go@v0.34.1/tools/cache/reflector.go:361
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
        k8s.io/apimachinery@v0.34.1/pkg/util/wait/backoff.go:233
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext.func1
        k8s.io/apimachinery@v0.34.1/pkg/util/wait/backoff.go:255
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext
        k8s.io/apimachinery@v0.34.1/pkg/util/wait/backoff.go:256
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
        k8s.io/apimachinery@v0.34.1/pkg/util/wait/backoff.go:233
k8s.io/client-go/tools/cache.(*Reflector).RunWithContext
        k8s.io/client-go@v0.34.1/tools/cache/reflector.go:359
k8s.io/client-go/tools/cache.(*controller).RunWithContext.(*Group).StartWithContext.func3
        k8s.io/apimachinery@v0.34.1/pkg/util/wait/wait.go:63
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
        k8s.io/apimachinery@v0.34.1/pkg/util/wait/wait.go:72
```

6.  Verify creating a new `HealthyRgd` resource (e.g. named `healthy-rgd-test-2`) - **Works**

```shell
kubectl apply -f healthy-rgd-resource-2.yaml
healthyrgd.healthy.rgd.com/healthy-rgd-resource-2 created

kubectl get healthyrgd -n test-namespace      
NAME                     STATE    READY   AGE
healthy-rgd-resource-2     ACTIVE   True    77s

kubectl get namespace healthy-rgd-resource-2-test
NAME                        STATUS   AGE
healthy-rgd-resource-2-test   Active   2m5s
```

7. Modify the [HealthyRgd `ResourceGraphDefinition`](#healthy-resourcegraphdefinition) to add a new field (e.g. `newShinyField`) and verify it is updated:

```yaml
...
spec:
    schema:
        apiVersion: v1
        group: healthy.rgd.com
        kind: HealthyRgd
        spec:
            ...
            newShinyField: string | required=true description="A new shiny field added to test RGD updates"
...
kubectl apply -f healthy-rgd.yaml 
resourcegraphdefinition.kro.run/healthy-rgd-test configured

kubectl get rgd healthy-rgd-test -oyaml
...
(newShinyField is present in the spec)
...
``` 

8. Update the [`HealthyRgd` resource](#healthyrgd-resource) with the new field (`newShinyField`) - **Fails**

```yaml
kubectl apply -f healthy-rgd-resource.yaml
The request is invalid: patch: Invalid value: "...": strict decoding error: unknown field "spec.newShinyField"
```

9. Delete the [UnhealthyRgd `ResourceGraphDefinition`](#unhealthy-resourcegraphdefinition):

```shell
kubectl delete rgd unhealthy-rgd-test
resourcegraphdefinition.kro.run "unhealthy-rgd-test" deleted
```

**Note:** You can attempt to re-apply the [`HealthyRgd` resource](#healthyrgd-resource) but that will fail with the same previous decoding error

10. Delete the `kro-system` pod (the only way I have found around this broken state):

```shell
kubectl delete pod <pod-name> -n kro-system
```

11. Re-apply the [`HealthyRgd` resource](#healthyrgd-resource) with the new field (`newShinyField`):

```shell
kubectl apply -f healthy-rgd-resource.yaml
healthyrgd.healthy.rgd.com/healthy-rgd-resource configured
```

Everything is back to normal i.e. reconciliation is successful again.

**Versions**:
- kro version: `0.7.0`
- Kubernetes Version (`kubectl version`): `v1.33.5`

**Involved Controllers**:
- Controller URLs and Versions (if applicable): kro `dynamic-controller`

**Error Logs** (if applicable)**: Shared in the above reproduction steps

## Appendix

### ClusterRole

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    rbac.kro.run/aggregate-to-controller: 'true'
  name: kro:controller:healthy-rgd-test
rules:
  - apiGroups:
      - healthy.rgd.com
    resources:
      - healthyrgds
      - healthyrgds/status
    verbs:
      - '*'
  - apiGroups:
      - ''
    resources:
      - namespaces
    verbs:
      - '*'
```

### Healthy ResourceGraphDefinition

```yaml
apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
  name: healthy-rgd-test
spec:
  schema:
    apiVersion: v1
    group: healthy.rgd.com
    kind: HealthyRgd
    spec:
      expectedState: string | required=true description="The expected state of the resource"

  resources:
  - id: ns
    template:
      apiVersion: v1
      kind: Namespace
      metadata:
        name: ${schema.metadata.name}-test
```

### Unhealthy ResourceGraphDefinition

```yaml
apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
  name: unhealthy-rgd-test
spec:
  schema:
    apiVersion: v1
    group: unhealthy.rgd.com
    kind: UnhealthyRgd
    spec:
      expectedState: string | required=true description="The expected state of the resource"

  resources:
  - id: ns
    template:
      apiVersion: v1
      kind: Namespace
      metadata:
        name: ${schema.metadata.name}-generated
```

### HealthyRgd resource

```yaml
apiVersion: healthy.rgd.com/v1
kind: HealthyRgd
metadata:
  name: healthy-rgd-resource
  namespace: test-namespace
spec:
  expectedState: "healthy"
```

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment


### Which option describes the most your issue?

ResourceGraphDefinition (Create, Update, Deletion)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unhealthy ResourceGraphDefinition causes stale state for all other ResourceGraphDefinitions #886

Description

Appendix

ClusterRole

Healthy ResourceGraphDefinition