Skip to content

Commit be4925e

Browse files
committed
[ci] test splice ci
1 parent fb5ad6b commit be4925e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+32939
-0
lines changed
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
---
2+
name: Cut release and deploy
3+
about: Cut release and upgrade DevNet nodes
4+
title: Cut and deploy release 0.x.y
5+
labels: ""
6+
assignees: ""
7+
---
8+
9+
## Cut release
10+
11+
Note: Some commands assume you are using the [fish](https://fishshell.com/) shell.
12+
If you are using other shells, you may need to adjust the commands accordingly.
13+
For example, `foo (bar)` in `fish` is equivalent to `foo $(bar)` in `bash`.
14+
15+
The `VERSION` file specifies the release version to build, here referred to as `0.x.z`.
16+
The previous release is referred to as `0.x.y` in these instructions.
17+
18+
Regular releases are started from `origin/main`, while bugfix / patch releases are started from a previous release line.
19+
For the rest of this checklist, this will be called the _ancestor branch_.
20+
21+
Release versions can only be published from a _release line branch_. The _release line branch_ is `release-line-0.x.z` for release `0.x.z` in these instructions.
22+
A _release line branch_ is branched from the _ancestor branch_.
23+
24+
- [ ] Choose the _ancestor branch_. This can be `origin/main` for all regular releases or `origin/release-line-0.x.y` for bugfix releases.
25+
- [ ] Wait for everything to be merged in the _ancestor branch_ that we want in `0.x.z`.
26+
- [ ] ...
27+
- [ ] Ensure all changes to the previous release branch `origin/release-line-0.x.y` are also included in both the _ancestor branch_ and `origin/main`.
28+
This should be the case but sometimes a change gets missed.
29+
- Use one of the following approaches to find changes applied to release line `0.x.y` after it was branched off its ancestor branch (which may be different from the ancestor branch of the new release).
30+
- Run `git diff (git merge-base origin/release-line-0.x.y ANCESTOR_BRANCH) origin/release-line-0.x.y` and compare it to the checked out source code of the release line you're upgrading to.
31+
- Run `git log (git merge-base origin/release-line-0.x.y ANCESTOR_BRANCH)..origin/release-line-0.x.y` and compare it to the log of the release line you're upgrading to.
32+
- Open https://github.com/DACH-NY/canton-network-node/compare/BRANCH_COMMIT...release-line-0.x.y to see the changes in the GitHub UI, where `BRANCH_COMMIT` is the commit that the release line was branched off from.
33+
- [ ] Merge a PR into the _ancestor branch_ with the following changes:
34+
- [ ] Update the release notes (`docs/src/release_notes.rst`):
35+
- Replace `Upcoming` by the target version
36+
- Fix any spelling mistakes and make sure the RST rendering is not broken
37+
- Check whether any important changes are missing, for example by briefly comparing the release notes with `git log 0.x.y..` (replace `0.x.y` with the prev version)
38+
- [ ] Create a release branch called `release-line-0.x.z` from the merged commit
39+
- Note: release branches are subject to branch protect rules. Once you push the branch, you need to open PRs to make further changes.
40+
- [ ] Merge a PR into the release branch (`origin/release-line-0.x.z`) with the following changes:
41+
- [ ] Create an empty commit with `[release]` in the commit message so it gets published as a non-snapshot version. You may have to edit the commit message when pressing the merge button in the GitHub UI.
42+
- [ ] Trigger a CircleCI pipeline from the DA-internal (on main) with `run-job: publish-release-artifacts` and `splice-git-ref: release-line-0.x.z`
43+
- [ ] If _ancestor branch_ is not `origin/main`, forward port all changes made to the _ancestor branch_ as part of this release to `origin/main`
44+
- [ ] Update the Open source repos, see https://github.com/DACH-NY/canton-network-node/blob/main/OPEN_SOURCE.md
45+
- [ ] Merge the auto-generated PR in https://github.com/digital-asset/decentralized-canton-sync
46+
- [ ] Merge the auto-generated PR in https://github.com/hyperledger-labs/splice
47+
- [ ] After merging the PR on the DA OSS repo, go to Releases in that repo
48+
(https://github.com/digital-asset/decentralized-canton-sync/releases), find the draft
49+
release for the release you just created and publish it (click the edit pencil icon). This should be done after merging the PR because it will
50+
also automatically bundle the sources from the release-line branch.
51+
- [ ] Merge a PR into the _ancestor branch_ with the following changes:
52+
- Update `VERSION` and `LATEST_RELEASE`. `VERSION` should be the next planned release (typically bumping the minor version), and `LATEST_RELEASE` should be the version of the newly created release line.
53+
- [ ] Communicate to partners that a new version is available
54+
55+
## Upgrade our own nodes on DevNet
56+
57+
- [ ] If significant time has passed since cutting the release, ensure that there are no changes that need to be backported to the release branch.
58+
In particular, check for changes to the `cluster/configs` and `cluster/configs-private` submodules.
59+
- [ ] Merge a PR into the release branch (`origin/release-line-0.x.z`) with the following changes:
60+
- [ ] Update the cluster `config.yaml` file by setting the new reference under `synchronizerMigration.active.releaseReference` and update the `synchronizerMigration.active.version` to version `0.x.y`.
61+
- [ ] Update `cluster/deployment/devnet/.envrc.vars`, bumping the release version.
62+
- Currently, the affected env vars are `OVERRIDE_VERSION`, `CHARTS_VERSION`, and `MULTI_VALIDATOR_IMAGE_VERSION`.
63+
- [ ] Before merging, open the `preview_pulumi_changes` CircleCi workflow and approve the jobs to generate `deployment` and `devnet` previews.
64+
Review the changes together with someone else, paying particular attention to deleted or newly created resources.
65+
- [ ] Warn our partners on [#supervalidator-operations](https://daholdings.slack.com/archives/C085C3ESYCT): "We'll be upgrading the DA-2 and DA-Eng nodes on DevNet to test a new version. Some turbulence might be expected."
66+
- [ ] Forward-port the changes to `config.yaml` and `cluster/deployment/devnet/.envrc.vars` to `main`. The `deployment` stack, which watches `main`, should pick that up
67+
and upgrade the other pulumi stacks.
68+
- [ ] Wait for [the operator](https://github.com/DACH-NY/canton-network-node/tree/main/cluster#the-operator) to apply your changes
69+
- A good check is `kubectl get stack -n operator -o json | jq '.items | .[] | {name: .metadata.name, status: .status}'` should show all stacks as successful and on the right commit.
70+
Remember to check that the `lastSuccessfulCommit` field points to the release line that you expect.
71+
- [ ] Confirm that we didn't break anything; for example:
72+
- [ ] The [SV Status Report Dashboard](https://grafana.dev.global.canton.network.digitalasset.com/d/caffa6f7-c421-4579-a839-b026d3b76826/sv-status-reports?orgId=1) looks green
73+
- [ ] There are no (unexpected) open alerts
74+
- [ ] The docs are reachable at both https://dev.network.canton.global/ and https://dev.global.canton.network.digitalasset.com/
75+
76+
## Upgrade our own nodes on TestNet and MainNet
77+
78+
- [ ] One week after DevNet: TestNet
79+
- [ ] One week after TestNet: MainNet
80+
81+
## Follow up
82+
83+
- [ ] If you cut a release, remind the next person in the [rotation](https://docs.google.com/document/d/1f0nVeRnnxKQxwPi5nI2TiMq6qtHPwgiOjtUPUVJMKIk/edit?tab=t.0) that is up for cutting a release next week.
84+
- [ ] Persist any lessons learned and fix (documentation) bugs hit
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
name: 'Hard migration / Disaster recovery'
3+
about: 'Perform a hard migration or disaster recovery on a production cluster'
4+
title: 'NETWORK [Hard Synchronizer Migration|Disaster Recovery] DATE'
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
Agenda [here](https://docs.google.com/document/d/1AEh9ZMLPxmc9tKn0L7I5S48xHOR4GN__VpP2_IHnL0A/edit#heading=h.9pjnt72egfzq). *PLEASE UPDATE LINK*
11+
12+
Tracking sheet [here](https://docs.google.com/spreadsheets/d/1AKAVhGqxFkhe7kBnbLf9L-nfnpr1H0QjZ5PHMaVkvc8/edit?gid=128511196). *PLEASE UPDATE LINK*
13+
14+
Internal runbook [here](https://github.com/DACH-NY/canton-network-node/blob/main/cluster/README.md#via-the-pulumi-operator).
15+
16+
## Checklist
17+
18+
### Prepare
19+
20+
- [ ] If you are upgrading to a new Canton major version, manually test compatibility of dev/test/mainnet snapshots as described in `cluster/README.md`
21+
- [ ] open or create corresponding agenda and tracking sheet in [this folder](https://drive.google.com/drive/folders/1-HZPAiZ7wVei4nlp-AOyQ5TrZ-S5FVSB)
22+
- [ ] prepare our staging nodes and tell partners
23+
- [ ] (later) forward-port to branches that may serve as potential future release sources (e.g. for 0.2.8, 0.2 and `main`; `main` is always included)
24+
- [ ] a sufficient number of partners have reported that they are ready / prepared (or look as if they are); check once and escalate if check failed
25+
- [ ] (only if hard migration) vote on scheduled downtime
26+
- [ ] disable periodic CI jobs (including sv and validator runbook resets) on `main`
27+
- [ ] (only if DevNet) take down multi-validator stack (does not handle hard domain migrations in its current form): `cncluster pulumi multi-validator down` from release branch
28+
- [ ] (only if disaster recovery) test the `cncluster take_disaster_recovery_dumps` step
29+
- [ ] take backups with `cncluster backup_nodes` (all nodes in parallel!) as you would during the meeting - to confirm that the commands work for you and to have them ready
30+
- [ ] (shortly before the call) request PAM in case you'll need it later
31+
32+
### Call with all SVs (hard migrations version; remove me if DR)
33+
34+
- [ ] wait for current synchronizer to pause and dumps to be taken (`Wrote domain migration dump` in SV app / validator app logs)
35+
- [ ] ensure that apps are sufficiently caught up
36+
- [ ] take backups with `cncluster backup_nodes`
37+
- [ ] merge PRs for deployment branch & `main` to migrate to higher migration ID
38+
- [ ] check: domain is healthy
39+
40+
### Call with all SVs (DR version; remove me if hard migration)
41+
42+
- [ ] everyone scales down their CometBFT nodes with `kubectl scale deployment --replicas=0 -n <namespace> global-domain-<old-migration-id>-cometbft`
43+
- [ ] take backups with `cncluster backup_nodes`
44+
- [ ] agree on a timestamp based on logs (e.g., ask everyone for the `toInclusive` value of their latest `Commitment correct for sender and period CommitmentPeriod(fromExclusive = X, toInclusive = THIS)` log entry on the participant and use the min of that)
45+
- [ ] get the dumps with `cncluster take_disaster_recovery_dumps`
46+
- [ ] copy the dumps into our PVCs with `cncluster copy_disaster_recovery_dumps`
47+
- [ ] merge PRs for deployment branch & `main` to migrate to higher migration ID
48+
- [ ] check: domain is healthy
49+
50+
### Cleanup
51+
52+
- [ ] unset `synchronizerMigration.active.migratingFrom` on the release branch so that future redeploys don't attempt to migrate
53+
- [ ] (later) forward-port to branches that may serve as potential future release sources
54+
- [ ] trigger periodic CI jobs manually once to make sure the updates worked
55+
- [ ] re-enable periodic CI jobs on `main`
56+
- [ ] recheck above forward-port items (e.g. versions, migration IDs)
57+
- [ ] take down old synchronizer nodes (once we're allowed to based on agreement with other SVs)
58+
59+
### Follow-up
60+
61+
- [ ] make sure that the [next planned production network operation](https://docs.google.com/document/d/14gZQNdXLPUCfqxN4vLK_yGlsptfcHMJZR8e1oOKgqLc/edit) has assignees and will get done; escalate if this is not the case
62+
- [ ] improve docs (collect ideas here) and other things
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
name: Reset production cluster
3+
about: (Planned) reset of DevNet or TestNet
4+
title: Reset production cluster
5+
labels: ""
6+
assignees: ""
7+
---
8+
9+
Scheduled for: *date + time*
10+
11+
- [ ] (a few days before the scheduled time) Remind people on [#supervalidator-operations](https://daholdings.slack.com/archives/C085C3ESYCT) and [#validator-operations](https://daholdings.slack.com/archives/C08AP9QR7K4) that the reset is planned.
12+
- [ ] Merge a PR to the correct release branch that unprotects the databases so they can be deleted. Wait for this change to be applied (you can check [grafana](https://grafana.test.global.canton.network.digitalasset.com/d/QP_wDqDnz/pulumi-operator-stacks-dashboard?orgId=1) for example). Example PR: #16323
13+
- [ ] Prepare (don't merge yet!) a PR against the release branch for bootstrapping the new cluster (example: #16324). This includes but might not be limited to:
14+
- [ ] Set migration ID to 0
15+
- [ ] Increment `COMETBFT_CHAIN_ID_SUFFIX` by 1
16+
- [ ] Remove any `legacy` or `archive` migrations that might still be around.
17+
- [ ] Make sure that the current config of the running cluster matches what you will bootstrap (see `INITIAL_PACKAGE_CONFIG_JSON`)
18+
- [ ] (before starting the actual reset) Send another update to SVs and validators as well as internal channels that could be relevant.
19+
- [ ] (while pairing with someone) Uninstall the pulumi operator with `cncluster pulumi operator down`. Note that this does not take down the actual components, only the operator stack resources, so that the operator does not
20+
accidentally kick in at wrong times.
21+
- [ ] (while pairing with someone) Reset all the sv-canton stacks for **archived** migrations one day before the actual reset. You can also delete PVC snapshots and CloudSQL backups for the active migration. Expect this to be slow, which is why it's done one day in advance.
22+
- [ ] (while pairing with someone) Reset all stacks **except** the `infra` stack manually. `cncluster reset` could work, `CI=1 cncluster pulumi XYZ down --yes --skip-preview` will certainly work (check `kubectl get stacks -A` for stacks you should down and don't forget to also down the `deployment` stack). Expect some slowness/timeouts/rate-limiting from GCP.
23+
- [ ] Merge the PR you prepared above
24+
- [ ] Forward-port the PR you prepared above to `main`.
25+
- [ ] Once merged to main, redeploy the operator through CircleCI: on `main`, trigger `run-jon: deploy-operator`, `cluster: ...`.
26+
- [ ] Wait for the network to deploy and confirm that the `AmuletRules` `packageConfig` contains the expected DAR versions.
27+
- [ ] Prepare and merge a second PR to the release branch that configures wallet sweeps to the DA-Wallet party (`SV1_SWEEP`, you need their wallet to be onboarded, you can ask in [#da-wallet](https://daholdings.slack.com/archives/C073K97TL3U)) and (unless you already did this earlier) sets the `cloudSql.protect` back to `false` (example PR: https://github.com/DACH-NY/canton-network-node/pull/16329).
28+
- [ ] Tell SVs: "You are welcome to join now with migration ID 0 and chain ID X. Please reset your existing nodes completely, clearing out all databases and PVCs, and then onboard afresh." (example text, tweak as needed)
29+
- [ ] Tell validators: "Please wait until bootstrapping has complete and join in 2h from now, using migration ID 0. Please reset your existing nodes completely, clearing out all databases, and then onboard afresh." (example text, tweak as needed)
30+
- [ ] Forward port the final state on the release branch to main
31+
- [ ] Fix anything in this template that you didn't like
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
name: Tech debt / Support issue
3+
about: Create a tech debt or support issue (with some structure)
4+
title: ''
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
## What is this about?
11+
12+
*(your description here)*
13+
14+
*Remove this line once you have selected the correct milestone.*
15+
16+
## How important is this and why?
17+
18+
*(your estimate and thoughts here)*

.github/actionlint.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
self-hosted-runner:
2+
labels:
3+
- self-hosted-docker-tiny
4+
- self-hosted-docker-medium
5+
- self-hosted-docker-large
6+
- self-hosted-k8s-x-small
7+
- self-hosted-k8s-small
8+
- self-hosted-k8s-medium
9+
- self-hosted-k8s-large
10+
- self-hosted-k8s-x-large
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: "Restore Daml artifacts"
2+
description: "Restore the Daml artifacts cache"
3+
inputs:
4+
cache_version:
5+
description: "Version of the cache"
6+
required: true
7+
outputs:
8+
cache_hit:
9+
description: "Cache hit"
10+
value: ${{ steps.restore.outputs.cache-hit }}
11+
12+
runs:
13+
using: "composite"
14+
steps:
15+
- name: Restore Daml artifacts cache
16+
id: restore
17+
uses: actions/cache/restore@v4
18+
with:
19+
path: |
20+
/tmp/daml
21+
apps/common/frontend/daml.js
22+
key: daml-artifacts-${{ inputs.cache_version }} branch:${{ github.ref_name }} dependencies:${{ hashFiles('project/build.properties', 'project/BuildCommon.scala', 'project/DamlPlugin.scala', 'build.sbt', 'daml/dars.lock', 'nix/canton-sources.json') }} rev:${{ github.sha }}
23+
restore-keys: |
24+
daml-artifacts-${{ inputs.cache_version }} branch:${{ github.ref_name }} dependencies:${{ hashFiles('project/build.properties', 'project/BuildCommon.scala', 'project/DamlPlugin.scala', 'build.sbt', 'daml/dars.lock', 'nix/canton-sources.json') }}
25+
daml-artifacts-${{ inputs.cache_version }} branch:main dependencies:${{ hashFiles('project/build.properties', 'project/BuildCommon.scala', 'project/DamlPlugin.scala', 'build.sbt', 'daml/dars.lock', 'nix/canton-sources.json') }}
26+
- name: Extract Daml artifacts
27+
shell: bash
28+
run: |
29+
if [[ -e /tmp/daml/daml.tar.gz ]]; then
30+
tar --use-compress-program=pigz -xf /tmp/daml/daml.tar.gz
31+
else
32+
echo "No cached daml artifacts files found. Skipping..."
33+
fi
34+
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: "Save Daml artifacts"
2+
description: "Saves the Daml artifacts to the cache"
3+
inputs:
4+
cache_version:
5+
description: "Version of the cache"
6+
required: true
7+
load_cache_hit:
8+
description: "Cache hit from the restore Daml artifacts job (should be the cache_hit output from the restore Daml artifacts job)"
9+
required: true
10+
11+
runs:
12+
using: "composite"
13+
steps:
14+
- name: Archive Daml artifacts
15+
if: ${{ ! fromJson(inputs.load_cache_hit) }}
16+
shell: bash
17+
run: |
18+
mkdir -p /tmp/daml
19+
find . -type d -name ".daml" | tar --use-compress-program=pigz -cf /tmp/daml/daml.tar.gz -T -
20+
- name: Not archiving Daml artifacts
21+
if: ${{ fromJson(inputs.load_cache_hit) }}
22+
shell: bash
23+
run: |
24+
echo "Skipping Daml artifacts cache, as there was a cache hit"
25+
- name: Cache precompiled classes
26+
if: ${{ ! fromJson(inputs.load_cache_hit) }}
27+
uses: actions/cache/save@v4
28+
with:
29+
path: |
30+
/tmp/daml
31+
apps/common/frontend/daml.js
32+
key: daml-artifacts-${{ inputs.cache_version }} branch:${{ github.ref_name }} dependencies:${{ hashFiles('project/build.properties', 'project/BuildCommon.scala', 'project/DamlPlugin.scala', 'build.sbt', 'daml/dars.lock', 'nix/canton-sources.json') }} rev:${{ github.sha }}

.github/actions/cache/frontend_node_modules/restore/action.yml

Lines changed: 32 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)