-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Issue:
When deploying a Canonical K8s MAAS cluster using CAPI, if the default CNI option is set to false, the managment cluster is forever waiting for the bootstrap cluster to get created and arrive in running state since the CNI needs to be installed manually on the MAAS instance.
Steps to reproduce:
An option is present to set default CNI to false as shown here:
spec:
initConfig:
enableDefaultNetwork: false
Reference: https://documentation.ubuntu.com/canonical-kubernetes/latest/capi/reference/configs/#initconfig
Create a cluster using this configuration in CK8ControlPlane object and the issue will be reproducible.
Expected Behaviour:
The bootstrap cluster inside MAAS instance should be up and running and setting default CNI to false should not impact the boostrap cluster. Later when the bootstrap cluster is up (possibly using default CNI), user/CAPI can install their own CNI on the created cluster.
Actual Behaviour:
Inside the config used by script, it disables the CNI installation (cilium CNI installation). This methods works correct when deploying with k8s-snap package, but it fails via the CAPI method.
The bootstrap cluster is created without CNI and is stuck with only these pods:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-fc9c778db-bhqwc 0/1 Pending 0 11s
kube-system metrics-server-8694c96fb7-6b79j 0/1 Pending 0 11s
At this stage, k8s status doesn't show ready status and stays the same.
From the CAPMAAS side, it only proceeds if it is able to create a client to target cluster (using kubeconfig secret) and is able to list the nodes (only then it confirms that API server is online).
It waits with logs like this:
I0717 05:05:24.427993 1 maasmachine_controller.go:353] "API Server is not online; requeue" logger="controllers.MaasMachine" maasmachine="vishu-maas-1-cp-61386-8vvln" cluster="vishu-maas-1" state="Deploying" m-id="aqhn73"
Now since the bootstrap cluster inside the MAAS instance has CNI disabled, there is nothing from management side that can be done to get this cluster up. As per the documentation, the k8s cluster deployed with snap also needs to disable CNI first and then manually apply alternative CNI. However, CAPI can’t proceed here because user first needs to SSH inside the instance, access the k8s cluster and manually install CNI for the process to go through.