Skip to content

Commit 9502d27

Browse files
committed
Improve docs
1 parent 3426132 commit 9502d27

File tree

1 file changed

+88
-13
lines changed

1 file changed

+88
-13
lines changed

website/docs/usage/upgrade.md

Lines changed: 88 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,100 @@
22
title: Upgrade methods
33
---
44

5-
Use one of the following methods (depending on the changes) to upgrade a cluster to a newer version.
5+
# Upgrade methods
66

7-
### Rolling upgrade
7+
Use the upgrade method that matches the kind of change you are making. In most cases, a rolling upgrade is the safest option because it lets you replace nodes gradually while keeping the cluster available. Use backup and restore when you are building a fresh cluster, migrating to new infrastructure, or when release notes require a full rebuild instead of mixed-version operation.
88

9-
Use the following procedure to rotate all cluster nodes, one server at a time:
9+
## Before you start
1010

11-
1. Add a new server to the cluster with a configuration that joins them to the existing cluster.
12-
1. Stop dkron service on one of the old servers, if it was the leader allow a new leader to be elected. Note that it is better to remove the current leader at the end, to ensure a leader is elected from the new nodes.
13-
1. Use `dkron raft list-peers` to list current cluster nodes.
14-
1. Use `dkron raft remove-peer` to forcefully remove the old server.
15-
1. Repeat steps above until all old cluster nodes have been upgraded.
11+
Before upgrading any node:
1612

17-
### Backup & Restore
13+
1. Read the release notes for the target version and check whether mixed-version clusters are supported during the transition.
14+
2. Make sure the current cluster is healthy and has quorum.
15+
3. Export the current jobs so you have a recovery point:
1816

19-
Use the `/restore` API endpoint to restore a previously exported jobs file
17+
```bash
18+
curl -fsS http://localhost:8080/v1/jobs > backup.json
19+
```
20+
21+
4. Inspect the current Raft peers so you know which server is the leader and which peer IDs are registered:
2022

23+
```bash
24+
dkron raft list-peers
2125
```
22-
curl localhost:8080/v1/jobs > backup.json
23-
curl localhost:8080/v1/restore --form 'file=@backup.json'
26+
27+
:::tip
28+
When upgrading server nodes, it is usually best to leave the current leader for last. That reduces unnecessary leader elections while you rotate the rest of the cluster.
29+
:::
30+
31+
## Rolling upgrade
32+
33+
Use a rolling upgrade when you want to keep the cluster online and the target version supports a gradual transition.
34+
35+
### Recommended order
36+
37+
1. Upgrade agent-only nodes first.
38+
2. Upgrade follower server nodes one at a time.
39+
3. Upgrade the leader last.
40+
41+
### Server rotation procedure
42+
43+
Use the following procedure to replace server nodes one at a time:
44+
45+
1. Add a new server running the target version and configure it to join the existing cluster.
46+
2. Wait until the new server has joined successfully and the cluster is healthy.
47+
3. Stop Dkron on one old server.
48+
4. If that server was the leader, wait until a new leader is elected before continuing.
49+
5. List the current peers and identify the old server's peer ID:
50+
51+
```bash
52+
dkron raft list-peers
2453
```
2554

26-
This will restore all jobs and counters as they were in the export file.
55+
6. Remove the old server from the Raft configuration:
56+
57+
```bash
58+
dkron raft remove-peer --peer-id <peer-id>
59+
```
60+
61+
7. Confirm the cluster is healthy again.
62+
8. Repeat the process until every old server has been replaced.
63+
64+
:::warning
65+
Do not remove multiple server nodes at once. Dkron needs a healthy Raft quorum to continue scheduling jobs.
66+
:::
67+
68+
## Backup and restore
69+
70+
Use backup and restore when you need to recreate the cluster on new infrastructure or when a rolling upgrade is not appropriate.
71+
72+
### Export jobs from the existing cluster
73+
74+
```bash
75+
curl -fsS http://localhost:8080/v1/jobs > backup.json
76+
```
77+
78+
### Restore jobs into the new cluster
79+
80+
After the new cluster is running and has elected a leader, restore the exported jobs file:
81+
82+
```bash
83+
curl -fsS -X POST http://localhost:8080/v1/restore \
84+
--form 'file=@backup.json'
85+
```
86+
87+
The restore endpoint expects a multipart form field named `file`. If a job in the file already exists in the target cluster, it is overwritten with the definition from the backup.
88+
89+
:::warning
90+
This export and restore flow restores job definitions from the `/v1/jobs` payload. It should not be treated as a full cluster snapshot, and it does not recreate Raft state or execution history.
91+
:::
92+
93+
## After the upgrade
94+
95+
After either method completes:
96+
97+
1. Run `dkron raft list-peers` and confirm the expected server set is present.
98+
2. Verify that one node is leader and the cluster remains stable.
99+
3. Check the UI or API and confirm the expected jobs are present.
100+
4. Watch the next scheduled executions to ensure jobs are still running as expected.
101+
5. Keep the exported `backup.json` until you are confident the upgrade is complete.

0 commit comments

Comments
 (0)