Description
High Availability (HA) as I understand it the idea of running software in a way that makes it resistant to downtime by individual server failures, and perhaps further enhanced by features like a PodDisruptionBudget that can help avoid simultaneous downtime of multiple servers.
I raised an action point in the December JupyterHub team meeting that we document this better. This topic discussed in the context of changing of defaults of PDBs to not be enabled by default for our non-HA deployments.
We expose a configuration of replicas for various pods, but its mostly a remnant of the helm create
command, rather than our actual ability to support it.
- This Helm chart does not support HA in its hub pod, proxy pod, autohttps pod. It supports HA in the user-scheduler pod.
- We have PDBs disabled by default for our non-HA, and enabled by default for our HA replicas.
Action points
- Write about the HA status and what holds us back.
- Add warning comments to the replicas configuration of hub/proxy/autohttps pod in values.yaml
Current status as far as I know it
About HA for JupyterHub itself
jupyterhub/jupyterhub#1932 (comment)
About HA for autohttps
We run traefik, which support HA, but, not for automatic TLS cert acquisition. They support that in their enterprise version, but we can't use that.
About HA for the proxy pod
The proxy pod runs jupyterhub/configurablehttpproxy (CHP) - a NodeJS server, which is configured dynamically by JupyterHub's proxy_class in Python of the same name. The problem is that JupyterHub sends one REST API request configuring one CHP server chosen at random behind the k8s Service exposing it, not all. So, if we have multiple replicas, JupyterHub configuring CHP will only configure one with how to route traffic.
#1673 is open to support using KubeIngressProxy - a standalone Python class that defined in the jupyterhub/kubespawner project. It creates k8s Ingress resources that describe how to route traffic to pods, which in turn an external ingress controller could use to know how to route traffic. That way the limitation of the CHP based setup is resolved