|
2 | 2 | title: Upgrade methods |
3 | 3 | --- |
4 | 4 |
|
5 | | -Use one of the following methods (depending on the changes) to upgrade a cluster to a newer version. |
| 5 | +# Upgrade methods |
6 | 6 |
|
7 | | -### Rolling upgrade |
| 7 | +Use the upgrade method that matches the kind of change you are making. In most cases, a rolling upgrade is the safest option because it lets you replace nodes gradually while keeping the cluster available. Use backup and restore when you are building a fresh cluster, migrating to new infrastructure, or when release notes require a full rebuild instead of mixed-version operation. |
8 | 8 |
|
9 | | -Use the following procedure to rotate all cluster nodes, one server at a time: |
| 9 | +## Before you start |
10 | 10 |
|
11 | | -1. Add a new server to the cluster with a configuration that joins them to the existing cluster. |
12 | | -1. Stop dkron service on one of the old servers, if it was the leader allow a new leader to be elected. Note that it is better to remove the current leader at the end, to ensure a leader is elected from the new nodes. |
13 | | -1. Use `dkron raft list-peers` to list current cluster nodes. |
14 | | -1. Use `dkron raft remove-peer` to forcefully remove the old server. |
15 | | -1. Repeat steps above until all old cluster nodes have been upgraded. |
| 11 | +Before upgrading any node: |
16 | 12 |
|
17 | | -### Backup & Restore |
| 13 | +1. Read the release notes for the target version and check whether mixed-version clusters are supported during the transition. |
| 14 | +2. Make sure the current cluster is healthy and has quorum. |
| 15 | +3. Export the current jobs so you have a recovery point: |
18 | 16 |
|
19 | | -Use the `/restore` API endpoint to restore a previously exported jobs file |
| 17 | +```bash |
| 18 | +curl -fsS http://localhost:8080/v1/jobs > backup.json |
| 19 | +``` |
| 20 | + |
| 21 | +4. Inspect the current Raft peers so you know which server is the leader and which peer IDs are registered: |
20 | 22 |
|
| 23 | +```bash |
| 24 | +dkron raft list-peers |
21 | 25 | ``` |
22 | | -curl localhost:8080/v1/jobs > backup.json |
23 | | -curl localhost:8080/v1/restore --form 'file=@backup.json' |
| 26 | + |
| 27 | +:::tip |
| 28 | +When upgrading server nodes, it is usually best to leave the current leader for last. That reduces unnecessary leader elections while you rotate the rest of the cluster. |
| 29 | +::: |
| 30 | + |
| 31 | +## Rolling upgrade |
| 32 | + |
| 33 | +Use a rolling upgrade when you want to keep the cluster online and the target version supports a gradual transition. |
| 34 | + |
| 35 | +### Recommended order |
| 36 | + |
| 37 | +1. Upgrade agent-only nodes first. |
| 38 | +2. Upgrade follower server nodes one at a time. |
| 39 | +3. Upgrade the leader last. |
| 40 | + |
| 41 | +### Server rotation procedure |
| 42 | + |
| 43 | +Use the following procedure to replace server nodes one at a time: |
| 44 | + |
| 45 | +1. Add a new server running the target version and configure it to join the existing cluster. |
| 46 | +2. Wait until the new server has joined successfully and the cluster is healthy. |
| 47 | +3. Stop Dkron on one old server. |
| 48 | +4. If that server was the leader, wait until a new leader is elected before continuing. |
| 49 | +5. List the current peers and identify the old server's peer ID: |
| 50 | + |
| 51 | +```bash |
| 52 | +dkron raft list-peers |
24 | 53 | ``` |
25 | 54 |
|
26 | | -This will restore all jobs and counters as they were in the export file. |
| 55 | +6. Remove the old server from the Raft configuration: |
| 56 | + |
| 57 | +```bash |
| 58 | +dkron raft remove-peer --peer-id <peer-id> |
| 59 | +``` |
| 60 | + |
| 61 | +7. Confirm the cluster is healthy again. |
| 62 | +8. Repeat the process until every old server has been replaced. |
| 63 | + |
| 64 | +:::warning |
| 65 | +Do not remove multiple server nodes at once. Dkron needs a healthy Raft quorum to continue scheduling jobs. |
| 66 | +::: |
| 67 | + |
| 68 | +## Backup and restore |
| 69 | + |
| 70 | +Use backup and restore when you need to recreate the cluster on new infrastructure or when a rolling upgrade is not appropriate. |
| 71 | + |
| 72 | +### Export jobs from the existing cluster |
| 73 | + |
| 74 | +```bash |
| 75 | +curl -fsS http://localhost:8080/v1/jobs > backup.json |
| 76 | +``` |
| 77 | + |
| 78 | +### Restore jobs into the new cluster |
| 79 | + |
| 80 | +After the new cluster is running and has elected a leader, restore the exported jobs file: |
| 81 | + |
| 82 | +```bash |
| 83 | +curl -fsS -X POST http://localhost:8080/v1/restore \ |
| 84 | + --form 'file=@backup.json' |
| 85 | +``` |
| 86 | + |
| 87 | +The restore endpoint expects a multipart form field named `file`. If a job in the file already exists in the target cluster, it is overwritten with the definition from the backup. |
| 88 | + |
| 89 | +:::warning |
| 90 | +This export and restore flow restores job definitions from the `/v1/jobs` payload. It should not be treated as a full cluster snapshot, and it does not recreate Raft state or execution history. |
| 91 | +::: |
| 92 | + |
| 93 | +## After the upgrade |
| 94 | + |
| 95 | +After either method completes: |
| 96 | + |
| 97 | +1. Run `dkron raft list-peers` and confirm the expected server set is present. |
| 98 | +2. Verify that one node is leader and the cluster remains stable. |
| 99 | +3. Check the UI or API and confirm the expected jobs are present. |
| 100 | +4. Watch the next scheduled executions to ensure jobs are still running as expected. |
| 101 | +5. Keep the exported `backup.json` until you are confident the upgrade is complete. |
0 commit comments