Releases · berops/claudie

24 Apr 13:45

Despire

v0.12.1

1244c01

v0.12.1 Latest

Latest

v0.12.1

What's Changed

New feature introduced 'upgrade-lock' label. When set on nodes, it signals to Claudie to skip node drain on those nodes, blocking the workflow of pending changes until the label is removed from the nodes. #2062

# Before triggering an update
kubectl label node <node-name> claudie.io/upgrade-lock=true

# Apply updated InputManifest
kubectl apply -f manifest.yaml

# Claudie drains unlabeled nodes, skips labeled ones, and retries
# Verify replication/health on your workload

# Release the node when safe
kubectl label node <node-name> claudie.io/upgrade-lock-

For some of the newly added providers (Openstack), NAT hairpin has been introduced for some of the networking shortcomings as a workaround to make Claudie work correctly. #2066
Duplicate Taint definitions for Nodepools will now be removed. #2070
For autoscaled nodepools if a scaleup fails at least 3x Claudie will now consider that as a failure and will stop autoscaling instead of retrying indefinitely #2069

Assets 6

16 Apr 10:56

Despire

v0.12.0

ca5b777

v0.12.0

What's Changed

Changing credentials for providers will now be correctly propagated within the reconciliation loop #2056

Updated MongoDB to version 6.0 #2053

After deploying, verify Mongo version is 6.0

kubectl exec -it <primary-mongo-pod> -n claudie -- mongosh \
  -u <username> -p <password> --authenticationDatabase admin \
  --eval "db.adminCommand({ buildInfo: 1 }).version"

Manually set the feature set to version 6.0

kubectl exec -it <primary-mongo-pod> -n claudie -- mongosh \
  -u <username> -p <password> --authenticationDatabase admin \
  --eval "db.adminCommand({ setFeatureCompatibilityVersion: '6.0' })"

This command must perform writes to an internal system collection. If for any reason the command does not complete successfully, you can safely retry the command as the operation is idempotent.

Verify the update was processed. The following command should return 6.0 for the feature set.

kubectl exec -it <primary-mongo-pod> -n claudie -- mongosh \
  -u <username> -p <password> --authenticationDatabase admin \
  --eval "db.adminCommand({ getParameter: 1, featureCompatibilityVersion: 1 })"

Bug fixes

Fixed deletion of zero sized nodepools that would result in an endless reconciliation loop #2049

Assets 6

08 Apr 13:19

Despire

v0.11.2

da3b348

v0.11.2

What's Changed

Add custom SSH port support for dynamic and static nodepools by #2026

The requirement of the SSH port to be opened at 22 has been dropped. It is now possible for external templates to define
their own SSH port to which Claudie will connect to. The same applies to static nodepools which have the option exposed in the InputManifest

static:
  - name: control
    sshPort: 2222  # Optional: SSH port for connecting to static nodes. Defaults to 22.
    nodes:
      - endpoint: "192.168.10.1"
        secretRef:
          name: static-node-key
          namespace: <your-namespace>

Gracefully handling missing Cloudflare Load Balancing #2029
Dynamic nodes within a Kubernetes cluster will now be healthchecked by Claudie and if they're unhealthy for more than 12 mins Claudie will trigger an auto-repair mechanism
in which the node is replaced by first deleting it and subsequently joining a new node into the cluster. #2038

Assets 6

27 Mar 18:38

Despire

v0.11.1

fea63ed

0.11.1

v0.11.1

What's Changed

General maintenance update by updating dependencies. #2020

Assets 6

24 Mar 14:56

Despire

v0.11.0

b9473d8

v0.11.0

What's Changed

Added CloudRift cloud provider support #2000
Updated longhorn to version v1.11.1 #2007
Before upgrading to this Claudie version from v0.10.2, detach all Longhorn volumes and follow the manual checks described here: https://longhorn.io/docs/1.11.1/deploy/upgrade/#manual-checks-before-upgrade
More validation of the input manifest was moved into the webhook for the operator so that more immediate feedback is given when kubectl apply is executed #2008
When a node is scheduled for deletion, its drain is now limited to a ~30 minute timeout, after which the node will be deleted #2011
For node deletion disk scheduling on the longhorn level will now be applied before the node is deleted #2012

Assets 6

06 Mar 13:28

Despire

v0.10.2

39c1bc4

v0.10.2

What's Changed

Hetzner DNS will now be considered to be part of the hetzner cloud (hcloud) provider within claudie #1993
If you're using hetzner for DNS you will also need to use the v0.9.19 templates as from
this Claudie version onwards the previous templates will not work with the old hetzner dns solution.
Claudie will now deploy longhorn with version 1.10.2 #1998

Before upgrading to this Claudie version from v0.10.1, detach all Longhorn volumes and follow the manual checks described here: https://longhorn.io/docs/1.10.2/deploy/upgrade/#manual-checks-before-upgrade

Additional manual steps may also be required to ensure Longhorn upgrades correctly. To see the necessary steps, look at the Migration Requirement Before Longhorn v1.10 Upgrade section in Longhorn v1.10.1 release

Bug fixes

Fix API endpoint changes with proxy turned on #1996

Assets 6

23 Feb 19:25

samuelstolicny

v0.10.1

8f11256

v0.10.1

What's Changed

Exoscale template version bumped to v0.9.18

Bug fixes

Fixed GCP autoscaler adapter crashing when the zone field is omitted from the InputManifest. The adapter now uses aggregated list requests to query machine types across all zones #1989

Assets 6

19 Feb 10:58

Despire

v0.10.0

b14fa17

v0.10.0

Most notable changes (TL;DR)

This version introduces a regular loop that will periodically ensure that the created infrastructure always matches the specs from the InputManifest and it is aligned and corrected if it drifts.
This mechanism will be applied on any newly created clusters. Clusters imported from older versions of Claudie will become regularly reconciled after their first modification in InputManifest.
Longhorn v1.9.2 will now be deployed for clusters built with Claudie. For existing clusters built with v0.9.16 manual steps need to be done
before deploying Claudie v0.10.0:
- Please read about the manual steps here
The Builder service has been completely removed from Claudie. It is also recommended that you delete the Builder deployment after deploying the v0.10.x versions of Claudie. Claudie now uses NATS instead of the builder to dispatch tasks among the other services.
The BuilderTTL field, which was internal to Claudie's task dispatching process, was completely removed in favor of a work queue. Previously, when the BuilderTTL reached 0, a new diff with the current desired state was made, even if the scheduled task did not finish. Thus, it was possible for another task to be dispatched. This is no longer possible, as the move to NATS requires an explicit acknowledgment of the task to progress the building of the cluster.
The identification and scheduling of tasks has been overhauled. Claudie now has an initial version of a reconciliation loop. In the v0.9.x versions of Claudie, whenever a change was detected after running kubectl apply -f <your-input-manifest>
Claudie stopped and did not continue to health check or fix the error, even if the error was simply a network inconvenience, upon either a failure or success of building that change. As of now, with the reconciliation loop, every
kubectl apply -f <your-input-manifest> will explicitly state the desired state of your clusters, and Claudie will try endlessly to reach that desired state. This means that, in the event of any errors, changes will be reverted and then
reapplied, along with health checking, which helps identify potential misconfigurations or infrastructure issues. Claudie will then try to auto-repair these issues, if possible. The goal is to further improve the reconciliation loop with each release.
DynamoDB was removed in favor of native locking supported by newer versions of OpenTofu which ship with Claudie v0.10.x
Support for Exoscale

v0.10.0

What's Changed

Use native state locking provided by OpenTofu instead of relying on DynamoDB #1906
Upgrade kubeone to v1.12.1. Claudie now supports building the following Kubernetes versions: v1.32, v1.33, v1.34 #1913
Making use of a provider cache in the Terraformer, essentially removing the time spent downloading the provider on a cache hit #1907
Preventing kubeone from overriding config.toml which would collide with NvidiaGPU operator overrides #1916
Longhorn will now be deployed with the best-effort data-locality setting #1933
The Ansibler stage has been tweaked to take less time overall #1917
Genesis Cloud provider support dropped #1941
The zone field is now optional for dynamic nodepools defined in the Input Manifest. If omitted, Claudie will automatically distribute the nodes across zones #1947
Claudie will now deploy Longhorn with version 1.9.2 #1956.
Manual steps need to be done before
upgrading to Claudie v0.10.0 for Longhorn.

Claudie will now support GPU guest accelerator for GCP nodepools #1952
Previously, it was not possible to communicate this information to the templates used to spawn the infrastructure. With
the new changes, the GPU type and count will now be passed to the templates, correctly spawning a VM with the requested GPU.

 nodePools:
   dynamic:
     - name: gpu-workers
       providerSpec:
         name: gcp-provider
         region: europe-west1
         zone: europe-west1-b
       count: 1
       serverType: n1-standard-4
       image: ubuntu-2204-lts
       machineSpec:
         nvidiaGpuCount: 1              # <-- specify number of gpus.
         nvidiaGpuType: nvidia-tesla-t4 # <-- specify gpu type

Initial version of the reconciliation loop was added to Claudie #1951
Claudie will now endlessly healthcheck and try to fix errors on identified tasks. While currently this only resolves
basic scenarios, such as unreachable nodes, the aim is to broaden this with every release.
Claudie will no longer expect NGINX to be installed on existing clusters #1980
Part of the reconciliation loop is to refresh the current state infrastructure periodically after no tasks have been identified #1979
Added support for a new provider Exoscale

Bug fixes

Deletion process was fixed for newer versions of Kubernetes #1919
Deploy kubelet-csr-approver to approve kubelet server CSRs #1934

Assets 6

26 Nov 10:55

Despire

v0.9.16

56e059b

v0.9.16

What's Changed

The open stack provider will now use image names instead of image ids, this was due to the possibility of the ids being replaced by the provider and no longer valid #1902

Bug fixes

Fix cloudflare account id propagation when updating to newer claudie versions #1904

Assets 6

13 Nov 09:41

Despire

v0.9.15

97232b0

v0.9.15

Bug fixes

Fixes issues with incompatible docker api in the ansibler service that resulted in the error from #1885

Assets 6

Releases: berops/claudie

v0.12.1

v0.12.1

What's Changed

Uh oh!

v0.12.0

v0.12.0

What's Changed

Bug fixes

Uh oh!

v0.11.2

v0.11.2

What's Changed

Uh oh!

0.11.1

v0.11.1

What's Changed

Uh oh!

v0.11.0

v0.11.0

What's Changed

Uh oh!

v0.10.2

v0.10.2

What's Changed

Bug fixes

Uh oh!

v0.10.1

What's Changed

Bug fixes

Uh oh!

v0.10.0

Most notable changes (TL;DR)

v0.10.0

What's Changed

Bug fixes

Uh oh!

v0.9.16

v0.9.16

What's Changed

Bug fixes

Uh oh!

v0.9.15

v0.9.15

Bug fixes

Uh oh!