This is the repo I use for my personal cloud server, hosted as a VPS.
Features:
- Single-node K3s installation.
- Encryption at rest for Kubernetes secrets, etcd, and all container persistent volumes.
- Atomic upgrades by storing all stateful data on an external volume.
- Easily create local environments for testing.
- Automatic SSL certificates via LetsEncrypt.
- IPv4 and IPv6 support.
- A variety of workloads that I've deployed. Some highlights:
K3s
K3s is a lightweight Kubernetes distribution which includes useful single-node-cluster features like host path volumes, a LoadBalancer, and Traefik.
Jsonnet
Jsonnet is used to declare the desired Kubernetes workloads. The configuration boils down to a series of manifest files which are applied using either kubectl or kapp.
SOPS
Mozilla SOPS is used to encrypt Kubernetes secrets in this repository, and combined with sops-secrets-operator to decrypt them on the cluster. We use Age to as the encryption provider.
Choose the technology you will deploy to:
- Lima is available for quickly spinning up test clusters on a local macOS machine.
- Hetzner Cloud is available for creating a cloud-hosted cluster.
You'll need argc installed, as well as a variety of other utilities that will be printed when you use a command that requires them.
- The Lima driver requires that
limactlis installed. - The Hetzner driver requires that
hcloudis installed. For Hetzner, your should also create an empty project to host the resources you will use. Set uphcloudusinghcloud contextor by settingHCLOUD_TOKEN.
Generate an age key if you don't already have one: age-keygen -o development.key. The public key will be printed to the console; it should look like age1qal59j7k2hphhmnmurg4ymj9n32sz5dgnx5teks3ch72n4wjfevsupgahc.
Run argc init --driver=lima --age $AGE_PUBLIC_KEY local to create a cluster named local using the Lima driver. $AGE_PUBLIC_KEY should be your age public key. This command should take a few minutes to run and should stream logs throughout the process.
At the end, the script will print the disk encryption password. It is important that you store this somewhere safe; it is necessary to reboot or upgrade the server.
To use the Hetzner driver, run argc init --help and argc init --driver=hetzner --driver-help to see the arguments you need to pass. At a minimum, you'll need to use --location, --type, and --size.
You can create any number of clusters. Each stores its configuration in a subdirectory of env/. Looking at the local cluster in env/local/, we see these files:
kubeconfig.ymlis the kubeconfig you can use to access the cluster.sops-age-recipient.txtis the public key of the cluster's sops-secret-operator.config.libsonnetcontains the configuration for the workloads.secrets.ymlcontains the environment-specific SOPS-encrypted secrets. Each document in this YAML file should be a SopsSecret object, and you need to use a separate object for each namespace you want to add secrets to.authelia-users.ymlcontains a sample Authelia users database.
The default cluster configuration is an empty k3s installation. Use argc sync to deploy the workloads from config.libsonnet to the cluster.
- Traefik Dashboard - accessible via the self-signed certificate. Log in with authelia / authelia.
eval "$(argc activate $ENVIRONMENT)"- set up theKUBECONFIGvariable and others in the current terminal session. Useful to put this in your.envrcfor use with direnv.kubectl- Useenv/local/kubeconfig.ymlto accesskapp- Useenv/local/kubeconfig.ymlto accessargc sync- Run this to sync all workloads inconfig.libsonnet. This is equivalent to runningargc apply $WORKLOADfor each workload configured.argc render $WORKLOAD- Show the rendered manifest for the given workload.argc diff $WORKLOAD- Show a diff of the rendered manifest and the current cluster state.argc apply $WORKLOAD- Apply the rendered manifest to the cluster.
Workloads are managed using kapp, and can be deleted using kapp delete. There is presently no support for automatically pruning workloads that you remove from config.libsonnet.
You can use kapp delete -a $NAME to delete all resources associated with a workload. Note that the default reclaim policy of dynamically-provisioned PersistentVolumes (e.g. local-path PVs) is "Delete". You may want to change this to "Retain". Since the PersistentVolume isn't specified in the jsonnet configuration, you should do this using kubectl.
To reuse this volume at a later date, you should patch it again to set a claimRef matching the original PersistentVolumeClaim, then deploy the workload as usual.
kubectl patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
kubectl get pv # Verify that change has been applied.
# Untested commands
kubectl patch pv <your-pv-name> -p '{"spec":{"claimRef":{"namespace":"default","name":"your-pvc-name"}}}'Option A: My Server is a "pet"
You can follow the normal K3s upgrade guide, as well as the normal Ubuntu upgrade guide.
Option B: My server is "cattle"
It is also possible to simply swap out the server for a new one using the same data drive. This method gives a fresh install of k3s from a known-good image.
To use this second approach, see argc upgrade --help and argc upgrade --driver-help $ENVIRONMENT for the available options. The basic approach looks like this:
- Create a snapshot of the current server to roll back to if something happens:
hcloud server create-image - Replace your server with a new one using
argc upgrade - Unseal the server with
argc unseal - Verify everything works. If you need to roll back to the previous version, use the snapshot you created in step 1 (e.g.
argc upgrade $ENVIRONMENT --image=my-snapshot-id). - Delete the snapshot once you are happy with the upgrade.
Once you no longer need an environment, use argc destroy to remove it. This will delete all local/cloud resources, and remove the env/ subdirectory.
Here is a checklist of things you should do when you are ready to deploy your cluster to production.
- Turn on accidental deletion protection for the volume and primary IPs:
hcloud volume enable-protectionandhcloud primary-ip enable-protection. - Configure DNS for the main domain and subdomains.
You may want to set up your SSH config for access to the server. This is fine, but please note that argc upgrade will cause the SSH host key to change. You can avoid this by using the Hostname directive in your SSH config. The argc upgrade script will automatically update the host key when the upgrade is performed.
# Example configuration for SSH
Host my.cluster.dns
Hostname 188.245.147.159
User rootHere are the main directories in this repository
env/$ENVIRONMENTdescribes a single environment. My production deployment is checked in here, which you can see as an example.driver/is a directory containing the scripts to manage the infrastructure powering the cluster. These are not meant to be run directly, instead accessed through the rootArgcfile.sh.workloads/is the main Jsonnet directory.- Subdirectories here correspond to individual workloads which can be enabled and configured using the environment's
config.libsonnetfile.
- Subdirectories here correspond to individual workloads which can be enabled and configured using the environment's
These are some basic commands that can be used for troubleshooting:
# View node status
kubectl get nodes
# Check control plane components
kubectl get componentstatuses
# Review Kubernetes events
kubectl get events -A
# List deployments (check for anything not fully ready)
kubectl get deployments -A
# Look for failed jobs
kubectl get job -AUnclear!
- If you change the server's primary IP addresses in the cloud provider console, it may be necessary to run
cloud-init clean -c networkand reboot in order for the server to detect the changes. Failing to do this may result in a partially updated network (e.g. IPv4 works but IPv6 does not).
This is a toy project for personal use. As a result, the security model has been simplified from the normal one that you would encounter in a production system. At its core, the key difference is that a single-node system will be fully compromised if root access is gained on that node. The key implication of this is: if a job escapes its sandbox, everything is compromised. Specifically:
- Access to the root privileges on the host system can be used to read the unencrypted contents of the cluster's drive.
- Access to kube-apiserver can be used to run an arbitrary pod with root privileges on the host system.
- Helm charts installed from URLs can be modified at any time in the future to run an arbitrary pods with root privileges on the host system.
The steps required to make this setup "production-ready" are:
- Set up Pod Security Admissions to prevent pods from being able to access resources that they shouldn't (host system resources, kube-system namespace, etc).
- Follow the K3s CIS Hardening Guide.
- Note: the Kubernetes-native secrets encryption is not used; instead the entire etcd store is encrypted using full disk encryption.
- 2025-05-30: The infrastructure underwent a substantial change from Nomad to Kubernetes. The older version can be found here. It uses Nomad, Consul, and Vault, as well as Ansible for managing the configuration of the server.