-
-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Mini Talos Cluster 1.0
Motivation
The goal was to move the cluster that was running on a bunch of poweredges and a large fileserver into a format that was easier to move and used significantly less power. At the same time I rebuilt my entire network with Mikrotik gear and simiplified a few components. Besides my desktop and the upstairs network switch everything runs in the mini rack.
Specs / Hardware
Networking
- Router: Mikrotik RB5009UPr+S+IN, provides PoE to downstairs devices (AP, cameras, Zigbee) along with the Modem (via 12V PoE splitter)
- Core Switch: Mirotik CRS310-8G+2S+IN: Provides local 2.5gb connectivity for the cluster, the uplink to the media center switch. If there was a version with 4x sfp+ it would be perfect.
- Cable Modem: Some Arris thing, powered off of a PoE port via usb splitter. If there was a 2.5gb PoE splitter that supplied 12v it would really simplify things.
Aux Services
- 8GB Raspberry Pi 4: dnsmasq, provides DNS/DHCP for the network. DHCP is forwarded to it via the router. It provides DNS names for local subnets, and forwards to other DNS servers as needed.
- 4GB Raspberry Pi 4: misc development, testing
- 8GB Raspberry Pi 5: Main developer instance
Talos Kubernetes Cluster
This is the core of the network, could I replace this a chunky single system and run everything in docker? Maybe. Ok, for sure, but clustering is fun. :) All cluster nodes are connected via 2.5GB to the core switch.
- 4x Odroid H4 Ultra, 32GB RAM, 256GB NVME boot/OS drive, 2x 800GB SSD, 2x 22TB HDD per node.
- 1x Intel Core Ultra 235, 32GB RAM, 256GB NVME OS/Boot drive
Powering
Looking at this, one might wonder how this is powered.
- Each rack is powered via a Tripp Lite series 600VA. Under load the total system can trip the 300W max of a single UPS.
- Each ODroid shares a single 19V/7A PSU via a Y-splitter. The Odroid devs suggest using a single such PSU if connecting 4x HDDs, so I figured a single PSU could handle two nodes. So far no issues.
- The Router and core switch share a single 48W PSU via a Y-splitter.
- The modem is powered off of a PoE Splitter.
- The Raspberry Pis are all powered via PoE HATs.
Cooling
- 2x Noctua 80mm fans per storage node. Probably should have saved money here.
- 1x 200mm Noctua fan venting the Core Ultra node.
Software Configuration
Kubernetes Core Components
A detailed walkthrough of these components is beyond the scope of this post, but I have been quite happy with it's functionality and stability. A few small components have been left out for clarity, and this is simply the core of the system.
- OS: Talos Linux. This lets me store each node's config as a YAML file, and is very minimal, so no worries about drift, or dependency hell. Three storage/control plane nodes, 1 storage/worker, and one pure worker node.
- Rook-Ceph (more on this below), but at the high level three control-plane nodes are also ceph control planes. The 4th odroid has storage disks attached, but handles workloads as well.
- argocd: GitOps, everything is deployed via ArgoCD based on a single monorepo stored in git
- authentik: OIDC/LDAP SSO
- cert-manager: Handles Local CA certs, and LetsEncrypt certs.
- cilium: Core networking layer, BGP-based load-balancer.
- cloudnative-pg: Postgresql operator, deploy postgres instances via kubernetes resources.
- crowdsec: Distributed blocklists, integrates with Traefik and base Linux nodes.
- external-dns: Updates various DNS as needed, and serves DNS entries for cluster services.
- forgejo: Git instance/issue tracker
- monitoring: Grafana/Loki/Prometheus
- renovate: Periodically checks for updates to applications and opens PRs for each update.
- traefik: Reverse Proxy
- velero: Backups via offsite S3-compatible storage
Storage Cluster
The storage cluster is a Ceph cluster managed by the Rook-Ceph operator running in Kubernetes. It provides multiple categories of storage.
- SSD-backed block storage, with replicas=3 for DB volumes, and other container volumes
- SSD-backed filesystem, with replicas=2 for temp data
- HDD-backed filesystem, erasure-coded 2+2 for bulk data. Not the most efficient but my storage pool is over-speced anyway.
- HDD-backed filesystem, with replicas=3, for critical large datasets
- Object storage backed by above data sets for s3 compatibleish storage.
- (Experimental) Hybrid Pool, where each stripe of data is stored on 1 SSD and 2 HDD, would have good space usage and read speed, but poor write speeds.
A full Ceph overview is out of scope for this post, but it is very powerful and flexible. This is arguably one of the smallest reasonably reliable Ceph clusters. A Ceph cluster with 3 nodes couldn't heal a replicas=3 data pool if a single node went down. Ceph has the best Kubernetes support for a local network filesystem.
One last Ceph note: So far 2.5g networking has been fine. For the most part the bottleneck has been the disk read/write speed because I only have 2 HDD per node. Faster networking would not help unless I had significantly more drives, or faster arrays in general. Faster networking would have massive tradeoffs in both speed and price. I actually downgraded most of my networking with this build and it hasn't had any significant negative impact.
Future Plans
- Desktop in-rack: I was planning on building a new gaming desktop/workstation, and fitting it in 3U. The RAM prices these days have put this plan on pause.
- Additional Cluster Nodes: I don't need any at the moment. Maybe eventually an Odroid H5 or Raspberry Pi 6 will come out that changes this thinking. Or maybe a Strix-Halo system for local machine learning nonsense. Who knows.