Skip to content

Commit 2191847

Browse files
authored
[crdb] Upgrade CockroachDB to 24.1.3 (#1075)
As requested in #1070 in order to properly support indexes used by the DSS, this PR upgrades CockroachDB to version 24.1.3. A backward incompatibility impacting migrations has been addressed in #1079. Depends on #1077, #1076 and #1080 to properly upgrade the version. This PR includes: Upgrade of all CRDB image references from 21.2.7 to 24.1.3. Migration notes and references to CRDB documentation. Helm: Instructions on how to adjust CRDB Helm deployment instructions to map to our Helm chart practices Tanka: Manual upgrade Kubernetes steps number have been changed to 1. instead of incremented numbers to reflect editing practices of other documents.
1 parent d887666 commit 2191847

File tree

10 files changed

+123
-37
lines changed

10 files changed

+123
-37
lines changed

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ test-go-units:
110110

111111
.PHONY: test-go-units-crdb
112112
test-go-units-crdb: cleanup-test-go-units-crdb
113-
@docker run -d --name dss-crdb-for-testing -p 26257:26257 -p 8080:8080 cockroachdb/cockroach:v21.2.7 start-single-node --listen-addr=0.0.0.0 --insecure > /dev/null
113+
@docker run -d --name dss-crdb-for-testing -p 26257:26257 -p 8080:8080 cockroachdb/cockroach:v24.1.3 start-single-node --insecure > /dev/null
114114
@until [ -n "`docker logs dss-crdb-for-testing | grep 'nodeID'`" ]; do echo "Waiting for CRDB to be ready"; sleep 3; done;
115115
go run ./cmds/db-manager/main.go --schemas_dir ./build/db_schemas/rid --db_version latest --cockroach_host localhost
116116
go test -count=1 -v ./pkg/rid/store/cockroach --cockroach_host localhost --cockroach_port 26257 --cockroach_ssl_mode disable --cockroach_user root --cockroach_db_name rid

build/dev/docker-compose_dss.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ version: '3.8'
88
services:
99

1010
local-dss-crdb:
11-
image: cockroachdb/cockroach:v21.2.7
11+
image: cockroachdb/cockroach:v24.1.3
1212
command: start-single-node --insecure
1313
ports:
1414
- "26257:26257"

cmds/core-service/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ go run ./cmds/core-service \
3232
To run correctly, core-service must be able to [access](../../pkg/cockroach/flags/flags.go) a CockroachDB cluster. Provision of this cluster is handled automatically for a local development environment if following [the instructions for a standalone instance](../../build/dev/standalone_instance.md). Or, a CockroachDB instance can be created manually with:
3333

3434
```bash
35-
docker container run -p 26257:26257 -p 8080:8080 --rm cockroachdb/cockroach:v21.2.7 start-single-node --insecure
35+
docker container run -p 26257:26257 -p 8080:8080 --rm cockroachdb/cockroach:v24.1.3 start-single-node --insecure
3636
```
3737

3838
#### Database configuration

deploy/MIGRATION.md

+114-28
Original file line numberDiff line numberDiff line change
@@ -1,98 +1,184 @@
1-
# Kubernetes version migration
1+
# CockroachDB and Kubernetes version migration
22

3-
This page provides information on how to upgrade your Kubernetes cluster deployed using the
3+
This page provides information on how to upgrade your CockroachDB and Kubernetes cluster deployed using the
44
tools from this repository.
55

6+
## CockroachDB upgrades
7+
8+
CockroachDB must be upgraded on all DSS instances of the pool one after the other. The rollout of the upgrades on
9+
the whole CRDB cluster must be carefully performed in sequence to keep the majority of nodes healthy during that period
10+
and prevent downtime.
11+
For a Pooled deployment, one of the DSS Instance must take the role of the upgrade "Leader" and coordinate the
12+
upgrade with other "Followers" DSS instances.
13+
In general a CockroachDB upgrade consists of:
14+
1. Upgrade preparation: Verify that the cluster is in a nominal state ready for upgrade.
15+
1. Decide how the upgrade will be finalized (for major upgrades only): Like CockroachDB, we recommend disabling auto-finalization.
16+
1. Perform the rolling upgrade: This step should be performed first by the Leader and as quickly as possible by the Followers **one after the other**. Note that during this period, the performance of the cluster may be impacted since, as documented by CockroachDB, "a query that is sent to an upgraded node can be distributed only among other upgraded nodes. Data accesses that would otherwise be local may become remote, and the performance of these queries can suffer."
17+
1. Roll back the upgrade (optional): Like the rolling upgrade, this step should be carefully coordinated with all DSS instances to guarantee the minimum number of healthy nodes to keep the cluster available.
18+
1. Finish the upgrade: This step should be accomplished by the Leader.
19+
20+
The following sections provide links to the CockroachDB migration documentation depending on your deployment type, which can
21+
be different by DSS instance.
22+
23+
**Important notes:**
24+
25+
- Further work is required to test and evaluate the availability of the DSS during migrations.
26+
- We recommend to review carefully the instructions provided by CockroachDB and to rehearse all migrations on a test
27+
environment before applying them to production.
28+
29+
### Terraform deployment
30+
31+
If a DSS instance has been deployed with terraform, first upgrade the cluster using [Helm](MIGRATION.md#helm-deployment)
32+
or [Tanka](MIGRATION.md#tanka-deployment). Then, update the variable `crdb_image_tag` in your `terraform.tfvars` to
33+
align your configuration with the new state of the cluster.
34+
35+
### Helm deployment
36+
37+
If you deployed the DSS using the Helm chart and the instructions provided in this repository, follow the instructions
38+
provided by CockroachDB `Cluster Upgrade with Helm` (See specific links below). Note that the CockroachDB documentation
39+
suggests to edit the values using `helm upgrade ... --set` commands. You will need to use the root key `cockroachdb`
40+
since the cockroachdb Helm chart is a dependency of the dss chart.
41+
For instance, setting the image tag and partition using the command line would look like this:
42+
```
43+
helm upgrade [RELEASE_NAME] [PATH_TO_DSS_HELM] --set cockroachdb.image.tag=v24.1.3 --reuse-values
44+
```
45+
```
46+
helm upgrade [RELEASE_NAME] [PATH_TO_DSS_HELM] --set cockroachdb.statefulset.updateStrategy.rollingUpdate.partition=0 --reuse-values
47+
```
48+
Alternatively, you can update `helm_values.yml` in your deployment and set the new image tag and rollout partition like this:
49+
```yaml
50+
cockroachdb:
51+
image:
52+
# ...
53+
tag: # version
54+
statefulset:
55+
updateStrategy:
56+
rollingUpdate:
57+
partition: 0
58+
```
59+
New values can then be applied using `helm upgrade [RELEASE_NAME] [PATH_TO_DSS_HELM] -f [helm_values.yml]`.
60+
We recommend the second approach to keep your helm values in sync with the cluster state.
61+
62+
#### 21.2.7 to 24.1.3
63+
64+
CockroachDB requires to upgrade one minor version at a time, therefore the following migrations have to be performed:
65+
66+
1. 21.2.7 to 22.1: see [CockroachDB Cluster upgrade for Helm](https://www.cockroachlabs.com/docs/v22.1/upgrade-cockroachdb-kubernetes?filters=helm).
67+
1. 22.1 to 22.2: see [CockroachDB Cluster upgrade for Helm](https://www.cockroachlabs.com/docs/v22.2/upgrade-cockroachdb-kubernetes?filters=helm).
68+
1. 22.2 to 23.1: see [CockroachDB Cluster upgrade for Helm](https://www.cockroachlabs.com/docs/v23.1/upgrade-cockroachdb-kubernetes?filters=helm).
69+
1. 23.1 to 23.2: see [CockroachDB Cluster upgrade for Helm](https://www.cockroachlabs.com/docs/v23.2/upgrade-cockroachdb-kubernetes?filters=helm).
70+
1. 23.2 to 24.1.3: see [CockroachDB Cluster upgrade for Helm](https://www.cockroachlabs.com/docs/v24.1/upgrade-cockroachdb-kubernetes?filters=helm).
71+
72+
### Tanka deployment
73+
74+
For deployments using Tanka configuration, since no instructions are provided for Tanka specifically,
75+
we recommend to follow the manual steps documented by CockroachDB: `Cluster Upgrade with Manual configs`.
76+
(See specific links below) To apply the changes to your cluster, follow the manual steps and reflect the new
77+
values in the *Leader* and *Followers* Tanka configurations, namely the new image version (see
78+
[`VAR_CRDB_DOCKER_IMAGE_NAME`](../build/README.md)) to ensure the new configuration is aligned with the cluster state.
79+
80+
#### 21.2.7 to 24.1.3
81+
82+
CockroachDB requires to upgrade one minor version at a time, therefore the following migrations have to be performed:
83+
84+
1. 21.2.7 to 22.1: see [CockroachDB Cluster upgrade with Manual configs](https://www.cockroachlabs.com/docs/v22.1/upgrade-cockroachdb-kubernetes?filters=manual).
85+
1. 22.1 to 22.2: see [CockroachDB Cluster upgrade with Manual configs](https://www.cockroachlabs.com/docs/v22.2/upgrade-cockroachdb-kubernetes?filters=manual).
86+
1. 22.2 to 23.1: see [CockroachDB Cluster upgrade with Manual configs](https://www.cockroachlabs.com/docs/v23.1/upgrade-cockroachdb-kubernetes?filters=manual).
87+
1. 23.1 to 23.2: see [CockroachDB Cluster upgrade with Manual configs](https://www.cockroachlabs.com/docs/v23.2/upgrade-cockroachdb-kubernetes?filters=manual).
88+
1. 23.2 to 24.1.3: see [CockroachDB Cluster upgrade with Manual configs](https://www.cockroachlabs.com/docs/v24.1/upgrade-cockroachdb-kubernetes?filters=manual).
89+
90+
## Kubernetes upgrades
91+
692
**Important notes:**
793

894
- The migration plan below has been tested with the deployment of services using [Helm](services/helm-charts) and [Tanka](../build/deploy) without Istio enabled. Note that this configuration flag has been decommissioned since [#995](https://github.com/interuss/dss/pull/995).
995
- Further work is required to test and evaluate the availability of the DSS during migrations.
1096
- It is highly recommended to rehearse such operation on a test cluster before applying them to a production environment.
1197

12-
## Google - Google Kubernetes Engine
98+
### Google - Google Kubernetes Engine
1399

14100
Migrations of GKE clusters are managed using terraform.
15101

16-
### 1.27 to 1.28
102+
#### 1.27 to 1.28
17103

18104
1. Change your `terraform.tfvars` to use `1.28` by adding or updating the `kubernetes_version` variable:
19105
```terraform
20106
kubernetes_version = 1.28
21107
```
22-
2. Run `terraform apply`. This operation may take more than 30min.
23-
3. Monitor the upgrade of the nodes in the Google Cloud console.
108+
1. Run `terraform apply`. This operation may take more than 30min.
109+
1. Monitor the upgrade of the nodes in the Google Cloud console.
24110

25-
### 1.26 to 1.27
111+
#### 1.26 to 1.27
26112

27113
1. Change your `terraform.tfvars` to use `1.27` by adding or updating the `kubernetes_version` variable:
28114
```terraform
29115
kubernetes_version = 1.27
30116
```
31-
2. Run `terraform apply`. This operation may take more than 30min.
32-
3. Monitor the upgrade of the nodes in the Google Cloud console.
117+
1. Run `terraform apply`. This operation may take more than 30min.
118+
1. Monitor the upgrade of the nodes in the Google Cloud console.
33119

34-
### 1.25 to 1.26
120+
#### 1.25 to 1.26
35121

36122
1. Change your `terraform.tfvars` to use `1.26` by adding or updating the `kubernetes_version` variable:
37123
```terraform
38124
kubernetes_version = 1.26
39125
```
40-
2. Run `terraform apply`
41-
3. Monitor the upgrade of the nodes in the Google Cloud console.
126+
1. Run `terraform apply`
127+
1. Monitor the upgrade of the nodes in the Google Cloud console.
42128

43-
### 1.24 to 1.25
129+
#### 1.24 to 1.25
44130

45131
1. Change your `terraform.tfvars` to use `1.25` by adding or updating the `kubernetes_version` variable:
46132
```terraform
47133
kubernetes_version = 1.25
48134
```
49-
2. Run `terraform apply`. This operation may take more than 30min.
50-
3. Monitor the upgrade of the nodes in the Google Cloud console.
135+
1. Run `terraform apply`. This operation may take more than 30min.
136+
1. Monitor the upgrade of the nodes in the Google Cloud console.
51137

52-
## AWS - Elastic Kubernetes Service
138+
### AWS - Elastic Kubernetes Service
53139

54140
Currently, upgrades of EKS can't be achieved reliably with terraform directly. The recommended workaround is to
55141
use the web console of AWS Elastic Kubernetes Service (EKS) to upgrade the cluster.
56142
Before proceeding, always check on the cluster page the *Upgrade Insights* tab which provides a report of the
57143
availability of Kubernetes resources in each version. The following sections omit this check if no resource is
58144
expected to be reported in the context of a standard deployment performed with the tools in this repository.
59145

60-
### 1.27 to 1.28
146+
#### 1.27 to 1.28
61147

62148
1. Upgrade the cluster (control plane) using the AWS console. It should take ~15 minutes.
63-
2. Update the *Node Group* in the *Compute* tab with *Rolling Update* strategy to upgrade the nodes using the AWS console.
64-
3. Change your `terraform.tfvars` to use `1.28` by adding or updating the `kubernetes_version` variable:
149+
1. Update the *Node Group* in the *Compute* tab with *Rolling Update* strategy to upgrade the nodes using the AWS console.
150+
1. Change your `terraform.tfvars` to use `1.28` by adding or updating the `kubernetes_version` variable:
65151
```terraform
66152
kubernetes_version = 1.28
67153
```
68154

69-
### 1.26 to 1.27
155+
#### 1.26 to 1.27
70156

71157
1. Upgrade the cluster (control plane) using the AWS console. It should take ~15 minutes.
72-
2. Update the *Node Group* in the *Compute* tab with *Rolling Update* strategy to upgrade the nodes using the AWS console.
73-
3. Change your `terraform.tfvars` to use `1.27` by adding or updating the `kubernetes_version` variable:
158+
1. Update the *Node Group* in the *Compute* tab with *Rolling Update* strategy to upgrade the nodes using the AWS console.
159+
1. Change your `terraform.tfvars` to use `1.27` by adding or updating the `kubernetes_version` variable:
74160
```terraform
75161
kubernetes_version = 1.27
76162
```
77163

78-
### 1.25 to 1.26
164+
#### 1.25 to 1.26
79165

80166
1. Upgrade the cluster (control plane) using the AWS console. It should take ~15 minutes.
81-
2. Update the *Node Group* in the *Compute* tab with *Rolling Update* strategy to upgrade the nodes using the AWS console.
82-
3. Change your `terraform.tfvars` to use `1.26` by adding or updating the `kubernetes_version` variable:
167+
1. Update the *Node Group* in the *Compute* tab with *Rolling Update* strategy to upgrade the nodes using the AWS console.
168+
1. Change your `terraform.tfvars` to use `1.26` by adding or updating the `kubernetes_version` variable:
83169
```terraform
84170
kubernetes_version = 1.26
85171
```
86172

87-
### 1.24 to 1.25
173+
#### 1.24 to 1.25
88174

89175
1. Check for deprecated resources:
90176
- Click on the Upgrade Insights tab to see deprecation warnings on the cluster page.
91177
- Evaluate errors in Deprecated APIs removed in Kubernetes v1.25. Using `kubectl get podsecuritypolicies`,
92178
check if there is only one *Pod Security Policy* named `eks.privileged`. If it is the case,
93179
according to the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/pod-security-policy-removal-faq.html), you can proceed.
94-
2. Upgrade the cluster using the AWS console. It should take ~15 minutes.
95-
3. Change your `terraform.tfvars` to use `1.25` by adding or updating the `kubernetes_version` variable:
180+
1. Upgrade the cluster using the AWS console. It should take ~15 minutes.
181+
1. Change your `terraform.tfvars` to use `1.25` by adding or updating the `kubernetes_version` variable:
96182
```terraform
97183
kubernetes_version = 1.25
98184
```

deploy/infrastructure/modules/terraform-aws-dss/terraform.dev.example.tfvars

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ authorization = {
2626
should_init = true
2727

2828
# CockroachDB
29-
crdb_image_tag = "v21.2.7"
29+
crdb_image_tag = "v24.1.3"
3030
crdb_cluster_name = "interuss_example"
3131
crdb_locality = "interuss_dss-aws-ew1"
3232
crdb_external_nodes = []

deploy/infrastructure/modules/terraform-google-dss/terraform.dev.example.tfvars

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ authorization = {
2727
should_init = true
2828

2929
# CockroachDB
30-
crdb_image_tag = "v21.2.7"
30+
crdb_image_tag = "v24.1.3"
3131
crdb_cluster_name = "interuss_example"
3232
crdb_locality = "interuss_dss-dev-w6a"
3333
crdb_external_nodes = []

deploy/operations/Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
FROM ubuntu:22.04
22

3-
ENV COCKROACH_VERSION 21.2.7
3+
ENV COCKROACH_VERSION 24.1.3
44

55
RUN apt-get update \
66
&& apt-get install -y unzip curl gnupg lsb-release apt-transport-https ca-certificates

deploy/operations/ci/aws-1/terraform.tfvars

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ authorization = {
2323
public_key_pem_path = "/test-certs/auth2.pem"
2424
}
2525
should_init = true
26-
crdb_image_tag = "v21.2.7"
26+
crdb_image_tag = "v24.3.1"
2727
crdb_cluster_name = "interuss-ci"
2828
crdb_locality = "interuss_dss-ci-aws-ue1"
2929
crdb_external_nodes = []

deploy/services/helm-charts/dss/values.example.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ dss:
1313
cockroachdb:
1414
# See https://github.com/cockroachdb/helm-charts/blob/master/cockroachdb/values.yaml
1515
image:
16-
tag: v21.2.7
16+
tag: v24.3.1
1717
fullnameOverride: dss-cockroachdb
1818
conf:
1919
join: []

test/migrations/clear_db.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,5 @@ echo "Starting CRDB container"
88
docker run -d --rm --name dss-crdb-for-migration-testing \
99
-p 26257:26257 \
1010
-p 8080:8080 \
11-
cockroachdb/cockroach:v21.2.7 start-single-node \
11+
cockroachdb/cockroach:v24.1.3 start-single-node \
1212
--insecure > /dev/null

0 commit comments

Comments
 (0)