|
1 | | -# terraform-octue-twined |
2 | | -A terraform module for creating an Octue service network. |
| 1 | +# terraform-octue-twined-cluster |
| 2 | +A terraform module for deploying a Kubernetes cluster for an Octue Twined service network to GCP. |
| 3 | + |
| 4 | + |
| 5 | +# Infrastructure |
| 6 | +This module is designed to manage multiple environments (e.g. testing, staging, production) in the same GCP project |
| 7 | +simultaneously. Environments provide isolated Twined service networks that can't easily interact with service networks |
| 8 | +in other environments. |
| 9 | + |
| 10 | +These resources are automatically deployed for each given environment: |
| 11 | +- An autopilot GKE Kubernetes cluster for running Twined service containers on. [Kueue](https://kueue.sigs.k8s.io/) is |
| 12 | + installed on the cluster to provide a queueing system where questions sent to Twined services are treated as jobs |
| 13 | +- A Kueue cluster queue, local queue, and default resource flavour to implement the job queueing system on the cluster |
| 14 | +- A Pub/Sub topic for all Twined service events to be published to |
| 15 | +- An event handler cloud function that stores all events in the event store and dispatches question events to the |
| 16 | + Kubernetes cluster as Kueue jobs |
| 17 | +- A service registry cloud function providing an HTTP endpoint for checking if an image exists in the artifact registry |
| 18 | + repository for any requested service revisions |
| 19 | +- An IAM service account and roles mapped to a Kubernetes service account for the cluster to use to access the resources |
| 20 | + deployed by the [terraform-octue-twined-core](https://github.com/octue/terraform-octue-twined-core) Terraform module |
| 21 | +- IAM roles for relevant google service agents |
| 22 | + |
| 23 | + |
| 24 | +# Installation and usage |
| 25 | + |
| 26 | +> [!IMPORTANT] |
| 27 | +> This Terraform module must be deployed **after** the |
| 28 | +> [terraform-octue-twined-core](https://github.com/octue/terraform-octue-twined-core) module in the same GCP project. |
| 29 | +> Both must be deployed to have a cloud-based Octue Twined services network. See |
| 30 | +> [a live example here](https://github.com/octue/twined-infrastructure). |
| 31 | +
|
| 32 | +> [!TIP] |
| 33 | +> Deploy this module in a separate Terraform configuration (directory/workspace) to the |
| 34 | +> [terraform-octue-twined-core](https://github.com/octue/terraform-octue-twined-core) |
| 35 | +> module. This allows the option to spin down the Kubernetes cluster while keeping the core resources that contain all |
| 36 | +> data produced by your Twined services available. Spinning the cluster down entirely can save on running costs in |
| 37 | +> periods of extended non-use while keeping all data available. |
| 38 | +
|
| 39 | +Add the below blocks to your Terraform configuration and run: |
| 40 | +```shell |
| 41 | +terraform plan |
| 42 | +``` |
| 43 | + |
| 44 | +If you're happy with the plan, run: |
| 45 | +```shell |
| 46 | +terraform apply |
| 47 | +``` |
| 48 | +and approve the run. This will create resources whose names/IDs are prefixed with `<environment>-` where `<environment>` |
| 49 | +is `main` by default. |
| 50 | + |
| 51 | +## Environments |
| 52 | +The suggested way of managing environments is via [Terraform workspaces](https://developer.hashicorp.com/terraform/language/state/workspaces). |
| 53 | +You can get started right away with the `main` environment by removing the `environment` input to the module. |
| 54 | + |
| 55 | +To create and used other environments, see the example configuration below. It contains a `locals` block that |
| 56 | +automatically generates the environment name from the name of the current Terraform workspace by taking the text after |
| 57 | +the final hyphen. This supports uniquely named environments in Terraform Cloud (which must be unique within the |
| 58 | +organisation) while keeping the environment prefix short but unique within your GCP project. For this to work well, |
| 59 | +ensure your Terraform workspace names are slugified. |
| 60 | + |
| 61 | +For example, if your Terraform workspace was called `my-project-testing`, the environment would be called `testing` and |
| 62 | +your resources would be named like this: |
| 63 | +- Pub/Sub topic: `testing.octue.services` |
| 64 | +- Event handler: `testing-octue-twined-service-event-handler` |
| 65 | +- Service registry: `testing-octue-twined-service-registry` |
| 66 | +- Kubernetes cluster: `testing-octue-twined-cluster` |
| 67 | + |
| 68 | + |
| 69 | +## Example configuration |
| 70 | + |
| 71 | +```terraform |
| 72 | +# main.tf |
| 73 | +
|
| 74 | +terraform { |
| 75 | + required_version = ">= 1.8.0" |
| 76 | + |
| 77 | + required_providers { |
| 78 | + google = { |
| 79 | + source = "hashicorp/google" |
| 80 | + version = "~>6.12" |
| 81 | + } |
| 82 | + kubernetes = { |
| 83 | + source = "hashicorp/kubernetes" |
| 84 | + version = "~>2.35" |
| 85 | + } |
| 86 | + kubectl = { |
| 87 | + source = "gavinbunney/kubectl" |
| 88 | + version = "~>1.19" |
| 89 | + } |
| 90 | + } |
| 91 | + |
| 92 | +} |
| 93 | +
|
| 94 | +
|
| 95 | +provider "google" { |
| 96 | + project = var.google_cloud_project_id |
| 97 | + region = var.google_cloud_region |
| 98 | +} |
| 99 | +
|
| 100 | +
|
| 101 | +data "google_client_config" "default" {} |
| 102 | +
|
| 103 | +
|
| 104 | +provider "kubernetes" { |
| 105 | + host = "https://${module.octue_twined_cluster.kubernetes_cluster.endpoint}" |
| 106 | + token = data.google_client_config.default.access_token |
| 107 | + cluster_ca_certificate = base64decode(module.octue_twined_cluster.kubernetes_cluster.master_auth[0].cluster_ca_certificate) |
| 108 | +} |
| 109 | +
|
| 110 | +
|
| 111 | +provider "kubectl" { |
| 112 | + load_config_file = false |
| 113 | + host = "https://${module.octue_twined_cluster.kubernetes_cluster.endpoint}" |
| 114 | + token = data.google_client_config.default.access_token |
| 115 | + cluster_ca_certificate = base64decode(module.octue_twined_cluster.kubernetes_cluster.master_auth[0].cluster_ca_certificate) |
| 116 | +} |
| 117 | +
|
| 118 | +
|
| 119 | +# Set the environment name to the last part of the workspace name when split on hyphens. |
| 120 | +locals { |
| 121 | + workspace_split = split("-", terraform.workspace) |
| 122 | + environment = element(local.workspace_split, length(local.workspace_split) - 1) |
| 123 | +} |
| 124 | +
|
| 125 | +
|
| 126 | +module "octue_twined_cluster" { |
| 127 | + source = "git::github.com/octue/terraform-octue-twined-cluster.git?ref=0.1.0" |
| 128 | + google_cloud_project_id = var.google_cloud_project_id |
| 129 | + google_cloud_region = var.google_cloud_region |
| 130 | + environment = local.environment |
| 131 | + cluster_queue = var.cluster_queue |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +```terraform |
| 136 | +# variables.tf |
| 137 | +
|
| 138 | +variable "google_cloud_project_id" { |
| 139 | + type = string |
| 140 | + default = "<google-cloud-project-id>" |
| 141 | +} |
| 142 | +
|
| 143 | +
|
| 144 | +variable "google_cloud_region" { |
| 145 | + type = string |
| 146 | + default = "<google-cloud-region>" |
| 147 | +} |
| 148 | +
|
| 149 | +
|
| 150 | +variable "cluster_queue" { |
| 151 | + type = object( |
| 152 | + { |
| 153 | + name = string |
| 154 | + max_cpus = number |
| 155 | + max_memory = string |
| 156 | + max_ephemeral_storage = string |
| 157 | + } |
| 158 | + ) |
| 159 | + default = { |
| 160 | + name = "cluster-queue" |
| 161 | + max_cpus = 100 |
| 162 | + max_memory = "256Gi" |
| 163 | + max_ephemeral_storage = "10Gi" |
| 164 | + } |
| 165 | +} |
| 166 | +``` |
| 167 | + |
| 168 | +## Dependencies |
| 169 | +- Terraform: `>= 1.8.0, <2` |
| 170 | +- Providers: |
| 171 | + - `hashicorp/google`: `~>6.12` |
| 172 | + - `hashicorp/kubernetes`: `~>2.35` |
| 173 | + - `gavinbunney/kubectl`: `~>1.19` |
| 174 | +- Google cloud APIs: |
| 175 | + - The Cloud Resource Manager API must be [enabled manually](https://console.developers.google.com/apis/api/cloudresourcemanager.googleapis.com) |
| 176 | + before using the module |
| 177 | + - All other required google cloud APIs are enabled automatically by the module |
| 178 | + |
| 179 | +## Authentication |
| 180 | + |
| 181 | +> [!TIP] |
| 182 | +> You can use the same service account as created for the [terraform-octue-twined-core](https://github.com/octue/terraform-octue-twined-core?tab=readme-ov-file#authentication) |
| 183 | +> module to skip steps 1 and 2. |
| 184 | +
|
| 185 | +The module needs to authenticate with google cloud before it can be used: |
| 186 | + |
| 187 | +1. Create a service account for Terraform and assign it the `editor` and `owner` basic IAM permissions |
| 188 | +2. Download a JSON key file for the service account |
| 189 | +3. If using Terraform Cloud, follow [these instructions](https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#using-terraform-cloud). |
| 190 | + before deleting the key file from your computer |
| 191 | +4. If not using Terraform Cloud, follow [these instructions](https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#authentication-configuration) |
| 192 | + or use another [authentication method](https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#authentication) |
| 193 | + |
| 194 | +## Destruction |
| 195 | +> [!WARNING] |
| 196 | +> If the `deletion_protection` input is set to `true`, it must first be set to `false` and `terraform apply` run before |
| 197 | +> running `terraform destroy` or any other operation that would result in the destruction or replacement of the cluster. |
| 198 | +> Not doing this can lead to a state needing targeted Terraform commands and/or manual configuration changes to recover |
| 199 | +> from. |
| 200 | +
|
| 201 | +Disable `deletion_protection` and run: |
| 202 | +```shell |
| 203 | +terraform destroy |
| 204 | +``` |
| 205 | + |
| 206 | + |
| 207 | +# Input reference |
| 208 | + |
| 209 | +| Name | Type | Required | Default | |
| 210 | +|--------------------------------------|---------------|----------|----------------------------------------------------------------------------------------| |
| 211 | +| `google_cloud_project_id` | `string` | Yes | N/A | |
| 212 | +| `google_cloud_region` | `string` | Yes | N/A | |
| 213 | +| `maintainer_service_account_names` | `set(string)` | No | `["default"]` | |
| 214 | +| `environment` | `string` | No | `"main"` | |
| 215 | +| `maximum_event_handler_instances` | `number` | No | `100` | |
| 216 | +| `maximum_service_registry_instances` | `number` | No | `10` | |
| 217 | +| `deletion_protection` | `bool` | No | `true` | |
| 218 | +| `kueue_version` | `string` | No | `"v0.10.1"` | |
| 219 | +| `question_default_resources` | `object` | No | `{cpus=1, memory="512Mi", ephemeral_storage="1Gi"}` | |
| 220 | +| `cluster_queue` | `object` | No | `{name="cluster-queue", max_cpus=10, max_memory="10Gi", max_ephemeral_storage="10Gi"}` | |
| 221 | +| `local_queue` | `object` | No | `{name="local-queue"}` | |
| 222 | + |
| 223 | + |
| 224 | +See [`variables.tf`](/variables.tf) for descriptions. |
| 225 | + |
| 226 | + |
| 227 | +# Output reference |
| 228 | + |
| 229 | +| Name | Type | |
| 230 | +|------------------------|----------| |
| 231 | +| `service_registry_url` | `string` | |
| 232 | +| `services_topic_name` | `string` | |
| 233 | +| `kubernetes_cluster` | `string` | |
| 234 | + |
| 235 | +See [`outputs.tf`](/outputs.tf) for descriptions. |
0 commit comments