RKE2 datastore bootstrap extract may fail due to copying in-use etcd db files

**Environmental Info:**
RKE2 Version: n/a

Node(s) CPU architecture, OS, and Version:
n/a

Cluster Configuration:
n/a

**Describe the bug:**

When RKE2 starts up, it creates a copy of the etcd DB files, and starts a temporary single-node etcd cluster with TLS disabled using the temp files, in order to extract the bootstrap data - which is needed to talk to the 'normal' etcd, which runs with TLS enabled.

On K3s this works fine since etcd can never be running if k3s is not running - as it runs in the main k3s process. However on RKE2, the etcd pod may still be running, which means that copies of the etcd db files may contain unsync'ed data, which causes the temporary etcd startup to fail:

```
Nov  7 21:56:46 vraldap1359644 rke2[3335886]: {"msg":"opened backend db","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","took":"1.963400582s"}
Nov  7 21:56:48 vraldap1359644 rke2[3335886]: {"msg":"recovered v2 store from snapshot","snapshot-index":113352078,"snapshot-size":"15 kB"}
Nov  7 21:56:48 vraldap1359644 rke2[3335886]: {"msg":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":113352078,"snapshot-file-path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/0000000006c19d8e.snap.db","error":"snap: snapshot file doesn't exist"}   
```

**Steps To Reproduce:**
This could happen any time rke2 is restarted, but it seems to be more likely to reproduce during cluster upgrades. 

**Expected behavior:**
RKE2 is consistently able to start up without errors from temporary etcd.

**Actual behavior:**
RKE2 sometimes fails to start with missing snapshot errors.

**Additional context / logs:**
SURE-11067

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RKE2 datastore bootstrap extract may fail due to copying in-use etcd db files #9427

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RKE2 datastore bootstrap extract may fail due to copying in-use etcd db files #9427

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions