💡 See the ci README for the specifics of deploying Kubernetes changes via GitOps. Only workloads (i.e. applications) are deployed via CI/CD and pull requests; changing the Kubernetes cluster itself (e.g. adding a node pool) is a manual operation.
📓 Both the Google Kubernetes Engine UI and the Lens Kubernetes IDE are useful GUI tools for interacting with a Kubernetes cluster, though you can get by with
kubectlon the command line.
We deploy our applications and services to a Google Kubernetes Engine cluster. If you are unfamiliar with Kubernetes, we recommend reading through the official tutorial to understand the main components (you do not have to actually perform all the steps).
A glossary exists at the end of this document.
The workflows described above also define their triggers. In general, developer workflows should follow these steps.
-
Check out a feature branch
-
Put up a PR for that feature branch, targeting
main-
preview-kuberneteswill run and add a comment showing the diff of changes that will affect the production Kubernetes clusterBE AWARE: This diff may NOT reveal any changes that have been manually applied to the cluster being undone. The
helm diffplugin used under the hood compares the new manifests against the saved snapshot of the last ones Helm deployed rather than the current state of the cluster. It has to work that way because that most accurately reflects how helm will apply the changes. This is why it is important to avoid making manual changes to the cluster.
-
-
Merge the PR
deploy-kuberneteswill run and deploy toprodthis time
We do not currently use Terraform to manage our cluster, nodepools, etc. and major changes to the cluster are unlikely to be necessary, but we do have some bash scripts that can help with tasks such as creating new node pools or creating a test cluster.
First, verify you are logged in and gcloud is pointed at cal-itp-data-infra and the us-west1 region.
gcloud auth login --login-config=iac/login.json
gcloud config set project cal-itp-data-infra
gcloud config get-value project
gcloud config get-value compute/regionThen, you may get credentials to the cluster (assuming one already exists).
gcloud container clusters get-credentials data-infra-apps (optionally specifying --region us-west1)
kubectl config get-contexts (prints possible auth contexts; GKE creates them with a defined name format)
kubectl config use-context gke_cal-itp-data-infra_us-west1_data-infra-apps🔴 You should only run this script if you intend to actually deploy a new cluster, though it will stop if the cluster already exists. This is likely to be a rare operation but may be necessary for migrating regions, creating a totally isolated test cluster, etc.
The cluster level configuration parameters are stored in config-cluster.sh. Creating the cluster also requires configuring parameters for a node pool named "default-pool" (unconfigurable name defined by GKE) in kubernetes/gke/config-nodepool.sh. Any additional node pools configured in this file are also stood up at cluster creation time.
Once the cluster is created, it can be managed by pointing the KUBECONFIG environment variable to kubernetes/gke/kube/admin.yaml or you can follow the above authentication steps.
./kubernetes/gke/cluster-create.sh
export KUBECONFIG=$PWD/kubernetes/gke/kube/admin.yaml
kubectl cluster-infoThe cluster can be deleted by running kubernetes/gke/cluster-delete.sh.
It's much more likely that a user may want to add or change node pools than make changes to the cluster itself. Certain features of node pools are immutable (e.g. machine type); to change such parameters requires creating a new node pool with the desired new values, migrating workloads off of the old node pool, and then deleting the old node pool. The node pool lifecycle scripts help simplify this process.
Configure a new node pool by adding its name to the GKE_NODEPOOL_NAMES array in kubernetes/gke/config-nodepool.sh. For each nodepool property (GKE_NODEPOOL_NODE_COUNT, GKE_NODEPOOL_NODE_LOCATIONS, etc) it is required to add an entry to the array which is mapped to the nodepool name. This config file is also where you will set Kubernetes taints and labels on the nodes.
Once the new nodepool is configured, it can be stood up by running kubernetes/gke/nodepool-up.sh <nodepool-name>, or by simply running kubernetes/gke/nodepool-up.sh, which will stand up all configured node pools which do not yet exist.
Once a new nodepool has been created to replace an active node pool, the old node pool must be removed from the GKE_NODEPOOL_NAMES array.
Once the old node pool is removed from the array, it can be drained and deleted by running kubernetes/gke/nodepool-down.sh <nodepool-name>.
Cluster workloads are divided into two classes:
-
Apps are the workloads that users actually care about; this includes deployed "applications" such as the GTFS-RT archiver.
-
System workloads are used to support running applications. The cluster currently relies on GKE-native ingress (Google Cloud Load Balancer + Google-managed certificates) so no in-cluster ingress controller or certificate manager is required.
Changes to workloads should be deployed by opening a pull request according to the GitOps section above.
JupyterHub is a good example of an application using a Helm chart that is ultimately exposed to the outside internet for user access. In general, any non-secret changes to the chart can be accomplished by modifying the chart's values.yaml.
Because we use Github OAuth for user authentication in JupyterHub, we have to provide a client-id and client-secret to the JupyterHub Helm chart. Here is what the full configuration for the GitHub OAuth in our JupyterHub Helm chart's values.yaml might look like:
hub:
config:
GitHubOAuthenticator:
client_id: <your-client-id-here>
client_secret: <your-client-secret-here>
oauth_callback_url: https://your-jupyterhub-domain/hub/oauth_callback
allowed_organizations:
- cal-itp:warehouse-users
scope:
- read:org
JupyterHub:
authenticator_class: github
Authenticator:
admin_users:
- machow
- themightchris
- lottspotWe want to avoid committing these secrets to GitHub, but we also want to version control as much of the values.yaml as possible. Fortunately, the JupyterHub chart affords us the ability to use the hub.existingSecret parameter to referencing an existing secret containing additional values.yaml entries. For GitHub OAuth specifically, the jupyterhub-github-config secret must contain a values.yaml key containing a base64-encoded representation of the following yaml:
hub:
config:
GitHubOAuthenticator:
client_id: <your-client-id-here>
client_secret: <your-client-secret-here>This encoding could be accomplished by calling cat <the secret yaml file> | base64 or using similar CLI tools; do not use an online base64 converter for secrets!
At the time of this writing, a JupyterHub deployment is available at https://jupyterhub.dds.dot.ca.gov. If this domain name needs to change, the following configurations must also change so OAuth and ingress continue to function.
-
Within the GitHub OAuth application, in Github, the homepage and callback URLs would need to be changed. Cal-ITP owns the Github OAuth application in GitHub, and this Cal-ITP Github issue can be referenced for individual contributors who may be able to help adjusting the Github OAuth application's homepage and callback URLs.
-
After the changes have been made to the GitHub OAuth application, the following files in
kubernetes/apps/charts/jupyterhub/must be updated:
values.yaml:jupyterhub.hub.config.GitHubOAuthenticator.oauth_callback_urltemplates/gke-managed-certificate.yaml:spec.domains(the list of hostnames the Google-managed cert will be issued for)templates/gke-ingress.yaml:spec.rules[].host(the hostnames the GKE Ingress accepts)
Allow ~15-30 minutes after the DNS for the new hostname points at the GCLB IP for the Google-managed certificate to provision before traffic on the new hostname will succeed over HTTPS.
Mostly cribbed from the official Kubernetes documentation
- Kubernetes - a platform for orchestrating (i.e. deploying) containerized software applications onto a collection of virtual machines
- Cluster - a collection of virtual machines (i.e. nodes) on which Kubernetes is installed, and onto which Kubernetes in turn deploys pods
- Pod - one (or more) containers deployed to run within a Kubernetes cluster
- For deployed services/applications, Pods exist because of a Deployment
- For ephemeral workloads (think Airflow tasks or database backups), Pods may be managed directly or via a Job
- Deployment - a Kubernetes object that manages a set of Pods, such as multiple replicas of the same web application
- StatefulSet - similar to Deployments but provides guarantees (e.g. deterministic network identifiers) necessary for stateful applications such as databases
- Service - an abstraction around Pods that provides a network interface within the cluster
- For example, a Redis instance needs a Service to be usable by other Pods
- Ingress - exposes Services to the outside world
- For example, a Metabase Service needs an Ingress to be accessible from the internet
- Volume - an abstraction of storage that is typically mounted into the file system of Pods
- Secrets/ConfigMaps - an abstraction of configuration information, typically mounted as environment variables of or files within Pods