Skip to content

Conversation

@kalavt
Copy link

@kalavt kalavt commented Dec 27, 2025

Description

The tigera-operator currently has a hardcoded dnsPolicy: ClusterFirstWithHostNet in the deployment template, which causes a DNS circular dependency on AWS EKS during initial cluster setup. This prevents successful Calico deployment on fresh EKS clusters.

This PR makes the dnsPolicy configurable via values.yaml while maintaining backward compatibility with the current default behavior.

Summary

Make dnsPolicy configurable to fix EKS deployment issues

Problem

  • tigera-operator has hardcoded dnsPolicy: ClusterFirstWithHostNet in the deployment template
  • On EKS, this causes DNS circular dependency during initial deployment:
    • Operator queries kube-dns (ClusterIP: 10.100.0.10)
    • ClusterIP requires kube-proxy iptables rules
    • iptables rules require CNI network
    • CNI network requires Calico
    • Calico requires Operator to deploy it → DEADLOCK
  • Operator cannot resolve EKS API server hostname before Calico is deployed
  • This blocks initial Calico deployment and cluster scaling on EKS

Solution

  • Make dnsPolicy configurable via values.yaml
  • Keep ClusterFirstWithHostNet as default for backward compatibility
  • Add conditional logic to allow users to override the default
  • EKS users can set dnsPolicy: Default to use node DNS (VPC DNS)
  • This breaks the circular dependency while maintaining cluster service resolution after Calico is deployed

Breaking Changes

None - default behavior unchanged

Usage

# values.yaml - For EKS clusters
dnsPolicy: Default  # Use node DNS to avoid circular dependency
# values.yaml - For other environments (default)
# dnsPolicy not set - uses ClusterFirstWithHostNet
# values.yaml - Advanced usage with dnsConfig
dnsPolicy: Default
dnsConfig:
  options:
    - name: timeout
      value: "2"

Testing

Tested the following scenarios:

  1. Default behavior (backward compatibility):

    • No dnsPolicy set in values.yaml
    • Deployment uses ClusterFirstWithHostNet
    • Existing deployments unaffected
  2. EKS with dnsPolicy: Default:

    • Set dnsPolicy: Default in values.yaml
    • Operator uses node DNS (VPC DNS)
    • Successfully deploys Calico on fresh EKS cluster
    • No DNS circular dependency
  3. Combined with dnsConfig:

    • Both dnsPolicy and dnsConfig can be set
    • Values are properly rendered in deployment

Related issues/PRs

fixes tigera/operator#4325
relates to #10683

Todos

  • Tests
  • Documentation
  • Release note

Release Note

Add support for configurable dnsPolicy in tigera-operator deployment to fix DNS circular dependency issues on AWS EKS. Users can now override the default ClusterFirstWithHostNet behavior by setting dnsPolicy: Default in values.yaml. This change maintains backward compatibility with existing deployments.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Suggested labels:

  • docs-completed: Documentation update included in PR description
  • release-note-required: This PR has user-facing changes (new configuration option)
  • Priority: priority/important-longterm (affects EKS users)

Code Changes

# templates/tigera-operator/02-tigera-operator.yaml
terminationGracePeriodSeconds: 60
hostNetwork: true
{{- if .Values.dnsPolicy }}
dnsPolicy: {{ .Values.dnsPolicy }}
{{- else }}
# Default: ClusterFirstWithHostNet for backward compatibility
# Note: For EKS clusters, consider setting dnsPolicy: Default in values.yaml
dnsPolicy: ClusterFirstWithHostNet
{{- end }}
{{- if .Values.dnsConfig }}
dnsConfig:
  {{- toYaml .Values.dnsConfig | nindent 8 }}
{{- end }}
containers:

Why This Fix is Needed

EKS-Specific Architecture:

  • CoreDNS on EKS is deployed as a regular Deployment (not hostNetwork)
  • CoreDNS requires CNI network to function
  • During initial cluster setup, no CNI is available
  • Node DNS (VPC DNS at <VPC_CIDR>.2) is available but not used with ClusterFirstWithHostNet
  • This creates an unbreakable deadlock

Why dnsPolicy: Default Works:

  • Operator uses node's /etc/resolv.conf (configured by EKS to use VPC DNS)
  • Operator can resolve EKS API server before deploying Calico
  • Calico deploys successfully
  • After Calico is running, kube-dns becomes available
  • Cluster services work normally

Impact:

  • Unblocks Calico deployment on fresh EKS clusters
  • Enables cluster scaling (new nodes can get network)
  • Provides simple, portable configuration (no VPC-specific hardcoding needed)
  • Maintains compatibility with all other Kubernetes distributions

@kalavt kalavt requested a review from a team as a code owner December 27, 2025 04:20
@marvin-tigera marvin-tigera added this to the Calico v3.32.0 milestone Dec 27, 2025
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Dec 27, 2025
@CLAassistant
Copy link

CLAassistant commented Dec 27, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-pr-required Change is not yet documented release-note-required Change has user-facing impact (no matter how small)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EKS: Hardcoded dnsPolicy: ClusterFirstWithHostNet causes DNS resolution deadlock during initial deployment

3 participants