Skip to content

Critical Safety Issues in API Gateway Controller #4894

@vorbidan

Description

@vorbidan

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

  1. Gateway Controller Processes Non-Consul Gateways & Deletes Resources Without Provenance Validation.
    The controller reconciles every Gateway resource in the cluster, regardless of gateway class.

return c, cleaner, ctrl.NewControllerManagedBy(mgr).
For(&gwv1beta1.Gateway{}).

  1. Intentional Processing of Non-Consul Gateways
    The code intentionally processes gateways not controlled by consul with explicit comments justifying this dangerous behavior.

// add our current gateway even if it's not controlled by us so we
// can garbage collect any resources for it.
resources.ReferenceCountGateway(gateway)

  1. Dangerous Deletion Logic Flow
    When a non-consul gateway is processed:

// isGatewayDeleted returns whether we should treat the given gateway as a deleted object.
// This is true if the gateway has a deleted timestamp, if its GatewayClass does not match
// our controller name, or if the GatewayClass it references doesn't exist.
func (b *Binder) isGatewayDeleted() bool {
gatewayClassMismatch := b.config.GatewayClass == nil || b.config.ControllerName != string(b.config.GatewayClass.Spec.ControllerName)
isGatewayDeleted := isDeleted(&b.config.Gateway) || gatewayClassMismatch || b.config.GatewayClassConfig == nil
return isGatewayDeleted
}

The gateway is treated as "Deleted", and then a K8s resource is deleted.
Gatekeeper deletes resources based on name and namespace check only - no provenance validation, here is the example how Deployment is being deleted:

func (g *Gatekeeper) deleteDeployment(ctx context.Context, gwName types.NamespacedName) error {
err := g.Client.Delete(ctx, &appsv1.Deployment{ObjectMeta: metav1.ObjectMeta{Name: gwName.Name, Namespace: gwName.Namespace}})
if k8serrors.IsNotFound(err) {
return nil
}

Other k8s resources, which do not "belong" to this controller are also deleted in a similar manner.

Reproduction Steps

  1. Deploy any non-consul gateway controller (kgateway)
  2. Create a Gateway with their gateway class
  3. Deploy consul-k8s with API Gateway controller
  4. Watch consul delete resources it didn't create

Logs

2025-10-13T22:05:35.661Z    DEBUG    Reconciling Gateway    {"gateway": {"name":"http","namespace":"kgateway-system"}}                                                                                                                             
2025-10-13T22:05:35.965Z    DEBUG    controllers.GatewayClass    Reconciling GatewayClass    {"gatewayClass": "kgateway"}                                                                                                                          
2025-10-13T22:05:36.062Z    DEBUG    controllers.GatewayClass    Reconciling GatewayClass    {"gatewayClass": "kgateway"}                                                                                                                          
2025-10-13T22:05:36.161Z    DEBUG    controllers.GatewayClass    Reconciling GatewayClass    {"gatewayClass": "kgateway"}                                                                                                                          
2025-10-13T22:05:36.162Z    DEBUG    deleting from Consul    {"gateway": {"name":"http","namespace":"kgateway-system"}, "kind": "api-gateway", "namespace": "", "name": "http"}

Expected behavior

  1. Gateway Class Filtering
  • Controller should ONLY reconcile gateways with gatewayClassName referencing a GatewayClass controlled by consul.hashicorp.com/gateway-controller
  • Non-consul gateways should be ignored completely - no reconciliation, no processing, no log messages
  1. Safe Resource Deletion
  • Before deleting any Kubernetes resource, controller must validate ownership:
  • Check for consul-specific labels: gateway.consul.hashicorp.com/managed: "true"
  • Verify consul annotations: consul.hashicorp.com/gateway-kind: "api-gateway"
  • Validate owner references pointing to the consul-managed Gateway
  • Skip deletion if resource wasn't created by consul with clear logging: "Skipping deletion - resource not managed by consul"

Environment details

  • consul-k8s version: 1.8.3
  • values.yaml used to deploy the helm chart:
# Values for Consul Helm chart for the primary federated datacenter "wc"
global:
  name: consul
  datacenter: wc

  # Configure ACLs for the Consul cluster.
  # See: https://developer.hashicorp.com/consul/docs/reference/k8s/helm#v-global-acls
  acls:
    manageSystemACLs: true
    # If ACLs are enabled, we must create a token for secondary
    # datacenters to replicate ACLs.
    createReplicationToken: true

  apiGateway:
    manageExternalCRDs: false

  # Enables WAN federation for this datacenter.
  # See: https://developer.hashicorp.com/consul/docs/reference/k8s/helm#v-global-federation
  federation:
    enabled: true
    #! primaryDatacenter: wc
    # This will cause a Kubernetes secret to be created that
    # can be imported by secondary datacenters to configure them
    # for federation.
    # See: https://developer.hashicorp.com/consul/docs/reference/k8s/helm#v-global-federation-createfederationsecret
    createFederationSecret: true
  
  # Configures gossip encryption for the Consul cluster.
  # See: https://developer.hashicorp.com/consul/docs/reference/k8s/helm#v-global-gossipencryption
  gossipEncryption:
    # Automatically generate a gossip encryption key and save it to a Kubernetes or Vault secret.
    autoGenerate: true

  # Enables TLS across the cluster to verify authenticity of the Consul servers and clients.
  # This is not the same CA as service mesh CA for service-to-service communication, which is
  # enabled with the `connectInject` option below. 
  # See: https://developer.hashicorp.com/consul/docs/reference/k8s/helm#v-global-tls
  tls:
    enabled: true
    # The Consul CA root certificate from the ca-consul-server secret.
    caCert:
      secretName: tls-ca
      secretKey: tls.crt
    caKey:
      secretName: tls-ca
      secretKey: tls.key

# Mesh gateways are gateways between datacenters. They must be enabled
# for federation in Kubernetes since the communication between datacenters
# goes through the mesh gateways.
# See: https://developer.hashicorp.com/consul/docs/reference/k8s/helm#v-meshgateway
meshGateway:
  enabled: true

# Configuration for Consul servers.
# See: https://developer.hashicorp.com/consul/docs/reference/k8s/helm#v-server
server:
  replicas: 1
  bootstrapExpect: 1
  connect: true
  
  # Server certificate and key from the server-cert secret.
  # This certificate is issued by the tls_ca CA out of band
  serverCert:
    secretName: "server-cert"
    secretKey: "tls.crt"
  serverKey:
    secretName: "server-cert"
    secretKey: "tls.key"

  # This should mount the connect-ca-config secret created by ExternalSecret
  # as a volume in the Consul server pods under. This provides the Consul servers
  # with the CA cert and key to sign service mesh certificates.
  extraVolumes:
    # Mounts /consul/userconfig/connect-ca-config/connect_config.json
    - name: connect-ca-config
      type: secret
      load: true
  
# Configures the automatic Connect sidecar injector.
# See: https://developer.hashicorp.com/consul/docs/reference/k8s/helm#h-connectinject
connectInject:
  enabled: true
  default: false
  # Enable central config to allow auth method creation
  centralConfig:
    enabled: true
  # Enable webhook to ensure proper initialization
  webhook:
    failurePolicy: "Ignore"
  apiGateway:
    # Disable the Gateway API CRDs since we are managing them externally via gwapi.
    manageExternalCRDs: false
  k8sDenyNamespaces: ['kgateway-system']
  namespaceSelector: |
    matchExpressions:
      - key: "kubernetes.io/metadata.name"
        operator: "NotIn"
        values: ["kube-system","local-path-storage","openebs","gmp-system","gke-managed-cim", "argocd","kgateway-system"]
  • Kubernetes version: v1.32.x
  • Cloud Provider VMWare
  • Networking CNI plugin in use: Cilium

Additional Context

This appears to be an architectural design flaw rather than an oversight. The intentional processing of non-consul gateways for "garbage collection" creates a fundamental safety violation in Kubernetes controller patterns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions