Name	Name	Last commit message	Last commit date
parent directory ..
apps	apps
gke	gke
kind	kind
README.md	README.md

Kubernetes

💡 See the ci README for the specifics of deploying Kubernetes changes via GitOps. Only workloads (i.e. applications) are deployed via CI/CD and pull requests; changing the Kubernetes cluster itself (e.g. adding a node pool) is a manual operation.

📓 Both the Google Kubernetes Engine UI and the Lens Kubernetes IDE are useful GUI tools for interacting with a Kubernetes cluster, though you can get by with kubectl on the command line.

We deploy our applications and services to a Google Kubernetes Engine cluster. If you are unfamiliar with Kubernetes, we recommend reading through the official tutorial to understand the main components (you do not have to actually perform all the steps).

A glossary exists at the end of this document.

GitOps

The workflows described above also define their triggers. In general, developer workflows should follow these steps.

Check out a feature branch
Put up a PR for that feature branch, targeting main
- preview-kubernetes will run and add a comment showing the diff of changes that will affect the production Kubernetes cluster
  
  BE AWARE: This diff may NOT reveal any changes that have been manually applied to the cluster being undone. The helm diff plugin used under the hood compares the new manifests against the saved snapshot of the last ones Helm deployed rather than the current state of the cluster. It has to work that way because that most accurately reflects how helm will apply the changes. This is why it is important to avoid making manual changes to the cluster.
Merge the PR
- deploy-kubernetes will run and deploy to prod this time

Cluster Administration

We do not currently use Terraform to manage our cluster, nodepools, etc. and major changes to the cluster are unlikely to be necessary, but we do have some bash scripts that can help with tasks such as creating new node pools or creating a test cluster.

First, verify you are logged in and gcloud is pointed at cal-itp-data-infra and the us-west1 region.

gcloud auth login --login-config=iac/login.json
gcloud config set project cal-itp-data-infra
gcloud config get-value project
gcloud config get-value compute/region

Then, you may get credentials to the cluster (assuming one already exists).

gcloud container clusters get-credentials data-infra-apps (optionally specifying --region us-west1)
kubectl config get-contexts (prints possible auth contexts; GKE creates them with a defined name format)
kubectl config use-context gke_cal-itp-data-infra_us-west1_data-infra-apps

Deploying the cluster

🔴 You should only run this script if you intend to actually deploy a new cluster, though it will stop if the cluster already exists. This is likely to be a rare operation but may be necessary for migrating regions, creating a totally isolated test cluster, etc.

The cluster level configuration parameters are stored in config-cluster.sh. Creating the cluster also requires configuring parameters for a node pool named "default-pool" (unconfigurable name defined by GKE) in kubernetes/gke/config-nodepool.sh. Any additional node pools configured in this file are also stood up at cluster creation time.

Once the cluster is created, it can be managed by pointing the KUBECONFIG environment variable to kubernetes/gke/kube/admin.yaml or you can follow the above authentication steps.

./kubernetes/gke/cluster-create.sh
export KUBECONFIG=$PWD/kubernetes/gke/kube/admin.yaml
kubectl cluster-info

The cluster can be deleted by running kubernetes/gke/cluster-delete.sh.

Nodepool lifecycle

It's much more likely that a user may want to add or change node pools than make changes to the cluster itself. Certain features of node pools are immutable (e.g. machine type); to change such parameters requires creating a new node pool with the desired new values, migrating workloads off of the old node pool, and then deleting the old node pool. The node pool lifecycle scripts help simplify this process.

Create a new node pool

Configure a new node pool by adding its name to the GKE_NODEPOOL_NAMES array in kubernetes/gke/config-nodepool.sh. For each nodepool property (GKE_NODEPOOL_NODE_COUNT, GKE_NODEPOOL_NODE_LOCATIONS, etc) it is required to add an entry to the array which is mapped to the nodepool name. This config file is also where you will set Kubernetes taints and labels on the nodes.

Once the new nodepool is configured, it can be stood up by running kubernetes/gke/nodepool-up.sh <nodepool-name>, or by simply running kubernetes/gke/nodepool-up.sh, which will stand up all configured node pools which do not yet exist.

Drain and delete an old node pool

Once a new nodepool has been created to replace an active node pool, the old node pool must be removed from the GKE_NODEPOOL_NAMES array.

Once the old node pool is removed from the array, it can be drained and deleted by running kubernetes/gke/nodepool-down.sh <nodepool-name>.

Deploying workloads

Cluster workloads are divided into two classes:

Apps are the workloads that users actually care about; this includes deployed "applications" such as the GTFS-RT archiver.
System workloads are used to support running applications. The cluster currently relies on GKE-native ingress (Google Cloud Load Balancer + Google-managed certificates) so no in-cluster ingress controller or certificate manager is required.

Changes to workloads should be deployed by opening a pull request according to the GitOps section above.

JupyterHub

JupyterHub is a good example of an application using a Helm chart that is ultimately exposed to the outside internet for user access. In general, any non-secret changes to the chart can be accomplished by modifying the chart's values.yaml.

Secrets

Because we use Github OAuth for user authentication in JupyterHub, we have to provide a client-id and client-secret to the JupyterHub Helm chart. Here is what the full configuration for the GitHub OAuth in our JupyterHub Helm chart's values.yaml might look like:

hub:
  config:
    GitHubOAuthenticator:
      client_id: <your-client-id-here>
      client_secret: <your-client-secret-here>
      oauth_callback_url: https://your-jupyterhub-domain/hub/oauth_callback
      allowed_organizations:
        - cal-itp:warehouse-users
      scope:
        - read:org
    JupyterHub:
      authenticator_class: github
      Authenticator:
        admin_users:
          - machow
          - themightchris
          - lottspot

We want to avoid committing these secrets to GitHub, but we also want to version control as much of the values.yaml as possible. Fortunately, the JupyterHub chart affords us the ability to use the hub.existingSecret parameter to referencing an existing secret containing additional values.yaml entries. For GitHub OAuth specifically, the jupyterhub-github-config secret must contain a values.yaml key containing a base64-encoded representation of the following yaml:

hub:
  config:
    GitHubOAuthenticator:
      client_id: <your-client-id-here>
      client_secret: <your-client-secret-here>

This encoding could be accomplished by calling cat <the secret yaml file> | base64 or using similar CLI tools; do not use an online base64 converter for secrets!

Domain Name Changes

At the time of this writing, a JupyterHub deployment is available at https://jupyterhub.dds.dot.ca.gov. If this domain name needs to change, the following configurations must also change so OAuth and ingress continue to function.

Within the GitHub OAuth application, in Github, the homepage and callback URLs would need to be changed. Cal-ITP owns the Github OAuth application in GitHub, and this Cal-ITP Github issue can be referenced for individual contributors who may be able to help adjusting the Github OAuth application's homepage and callback URLs.
After the changes have been made to the GitHub OAuth application, the following files in kubernetes/apps/charts/jupyterhub/ must be updated:

values.yaml: jupyterhub.hub.config.GitHubOAuthenticator.oauth_callback_url
templates/gke-managed-certificate.yaml: spec.domains (the list of hostnames the Google-managed cert will be issued for)
templates/gke-ingress.yaml: spec.rules[].host (the hostnames the GKE Ingress accepts)

Allow ~15-30 minutes after the DNS for the new hostname points at the GCLB IP for the Google-managed certificate to provision before traffic on the new hostname will succeed over HTTPS.

Glossary

Mostly cribbed from the official Kubernetes documentation

Kubernetes - a platform for orchestrating (i.e. deploying) containerized software applications onto a collection of virtual machines
Cluster - a collection of virtual machines (i.e. nodes) on which Kubernetes is installed, and onto which Kubernetes in turn deploys pods
Pod - one (or more) containers deployed to run within a Kubernetes cluster
- For deployed services/applications, Pods exist because of a Deployment
- For ephemeral workloads (think Airflow tasks or database backups), Pods may be managed directly or via a Job
Deployment - a Kubernetes object that manages a set of Pods, such as multiple replicas of the same web application
- StatefulSet - similar to Deployments but provides guarantees (e.g. deterministic network identifiers) necessary for stateful applications such as databases
Service - an abstraction around Pods that provides a network interface within the cluster
- For example, a Redis instance needs a Service to be usable by other Pods
Ingress - exposes Services to the outside world
- For example, a Metabase Service needs an Ingress to be accessible from the internet
Volume - an abstraction of storage that is typically mounted into the file system of Pods
Secrets/ConfigMaps - an abstraction of configuration information, typically mounted as environment variables of or files within Pods

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Kubernetes

GitOps

Cluster Administration

Deploying the cluster

Nodepool lifecycle

Create a new node pool

Drain and delete an old node pool

Deploying workloads

JupyterHub

Secrets

Domain Name Changes

Glossary

FilesExpand file tree

kubernetes

Directory actions

More options

Directory actions

More options

Latest commit

History

kubernetes

Folders and files

parent directory

README.md

Kubernetes

GitOps

Cluster Administration

Deploying the cluster

Nodepool lifecycle

Create a new node pool

Drain and delete an old node pool

Deploying workloads

JupyterHub

Secrets

Domain Name Changes

Glossary