Skip to content

hard coded cluster dns IP #201

@raphaelpoumarede

Description

@raphaelpoumarede

Details

While the doc says that the kubernetes_cluster_dns is "Automatically calculated from the service subnet" it seems to be hard-coded to 10.43.0.10.

cloud-user@pdcesx41598:~$ kubectl run dns-test2 --image=busybox:1.36 --restart=Never -it --rm -- sh
If you don't see a command prompt, try pressing enter.
/ #
/ #
/ # cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5

It should have set the nameserver IP to 10.42.0.10 (since I had kubernetes_service_subnet : "10.42.0.0/16"). As a workaround I could switch the kubernetes_service_subnet and kubernetes_pod_subnet values but it would be better to derive the kubernetes_cluster_dns from the kubernetes_service_subnet.

The mismatch then cause a failure whenthe deployment operator tries to reach "https://ses.sas.download/ses/entitlements.json" using the K8s coreDNS:

  Messages:
    Error loading entitlements file: "https://ses.sas.download/ses/entitlements.json"
    Failed to get 'https://ses.sas.download/ses/entitlements.json'
    Get "https://ses.sas.download/ses/entitlements.json": dial tcp: lookup ses.sas.download on 10.43.0.10:53: read udp 10.43.17.72:38944->10.43.0.10:53: i/o timeout
  State:          FAILED
  Support End:
  Support Level:

Terraform Variable File Details

No response

Ansible Variable File Details

# Ansible items
ansible_user     : "cloud-user"
#ansible_password : "lnxsas"

# VM items
vm_os   : "ubuntu" # Choices : [ubuntu|rhel] - Ubuntu 20.04 LTS / RHEL ???
vm_arch : "amd64"  # Choices : [amd64] - 64-bit OS / ???

# System items
enable_cgroup_v2    : true     # TODO - If needed hookup or remove flag
system_ssh_keys_dir : "~/.ssh" # Directory holding public keys to be used on each system

# Generic items
prefix : "GEL-k8s"
deployment_type: "bare_metal" # Values are: [bare_metal|vsphere]

# Kubernetes - Common
#
# TODO: kubernetes_upgrade_allowed needs to be implemented to either
#       add or remove locks on the kubeadm, kubelet, kubectl packages
#
kubernetes_cluster_name    : "{{ prefix }}-oss" # NOTE: only change the prefix value above
#kubernetes_version          : "1.30.3" https://kubernetes.io/releases/
kubernetes_version          : "1.33.5"

kubernetes_upgrade_allowed : true
kubernetes_arch            : "{{ vm_arch }}"
kubernetes_cni             : "calico"        # Choices : [calico]
kubernetes_cni_version      : "3.30.3"
kubernetes_cri             : "containerd"    # Choices : [containerd|docker|cri-o] NOTE: cri-o is not currently functional
kubernetes_service_subnet  : "10.42.0.0/16" # default values 
kubernetes_pod_subnet      : "10.43.0.0/16" # default values

# Kubernetes - VIP : https://kube-vip.io
# 
# Useful links:
#
#   VIP IP : https://kube-vip.chipzoller.dev/docs/installation/static/
#   VIP Cloud Provider IP Range : https://kube-vip.chipzoller.dev/docs/usage/cloud-provider/#the-kube-vip-cloud-provider-configmap
#
kubernetes_loadbalancer             : "kube_vip"
kubernetes_vip_version              : "0.5.7"
# we need to create static VIPs (eth0) - needs to run some commands to create/find the VIP IP in the network + register in DNS
# mandatory even for 1 control plan node
kubernetes_vip_interface            : "eth0"
kubernetes_vip_ip                   : "10.101.63.9" # for RACE EXNET pick a value in the "10.101.63.0+" unused range 
kubernetes_vip_fqdn                 : "osk-api-stud2.gelenable.sas.com" # DNS alias associated to the K8s CP VIP (names)
kubernetes_loadbalancer_addresses :
  - "range-global: 10.101.63.10-10.101.63.12" # IP range  for services type that require the LB IP access, range-<namespace>

# Kubernetes - Control Plane
control_plane_ssh_key_name : "cp_ssh"

# Labels/Taints , we associate label and taints to the K8s nodes 
# Note : here "hostname" command is used behind the scene. It does not necessarily correspond to the names used in the inventory

## Labels
node_labels:
  sasnode02:
    - kubernetes.azure.com/mode=system
  sasnode03:
    - kubernetes.azure.com/mode=system
  sasnode04:
    - kubernetes.azure.com/mode=system
  sasnode05:
    - workload.sas.com/class=cas
  sasnode06:
    - workload.sas.com/class=stateful
  sasnode07:
    - workload.sas.com/class=stateless
  sasnode08:
    - workload.sas.com/class=compute

## Taints
node_taints:
  sasnode05:
    - workload.sas.com/class=cas:NoSchedule

# Jump Server
jump_ip : pdcesx41598.race.sas.com

# NFS Server
nfs_ip  : pdcesx41540.race.sas.com

Steps to Reproduce

set this values

kubernetes_service_subnet  : "10.42.0.0/16" # default values 
kubernetes_pod_subnet      : "10.43.0.0/16" # default values

for the IAC, then run the DAC and check if the SASDeployment CRD reconciliation reaches the "SUCCESS" state.

Expected Behavior

The SASDeployment CRD reconciliation reaches the "SUCCESS" state.

Actual Behavior

the SASDeployment CRD reconciliation reaches the "FAILED" state and the description of the instanciated CR shows this message :

  Messages:
    Error loading entitlements file: "https://ses.sas.download/ses/entitlements.json"
    Failed to get 'https://ses.sas.download/ses/entitlements.json'
    Get "https://ses.sas.download/ses/entitlements.json": dial tcp: lookup ses.sas.download on 10.43.0.10:53: read udp 10.43.17.72:38944->10.43.0.10:53: i/o timeout
  State:          FAILED
  Support End:
  Support Level:

Additional Context

No response

References

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingnew

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions