Skip to content

mbologna/k3s-oci

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

k3s-oci

CI

A production-ready k3s Terraform module for the OCI Always Free tier.

Features

  • HA control plane — 3 control-plane nodes with embedded etcd; survives 1 node failure
  • Full stack always deployed — cert-manager, Longhorn, ArgoCD + Image Updater, and kured are always installed; they keep the cluster active and prevent idle reclamation
  • Separate public/private subnets — k3s nodes have no public IP; only LBs and the optional bastion are internet-facing
  • Envoy Gateway ingress (Gateway API) — DaemonSet with system-cluster-critical priority and PodDisruptionBudget maxUnavailable: 1; standard HTTPRoute/Gateway resources; real client IP preservation via NLB transparent mode
  • Automatic security updatesunattended-upgrades + kured drain-reboot-uncordon cycle; zero manual intervention
  • k3s version pinned at plan time — resolved from the GitHub API during terraform plan, not at boot time
  • Cluster-scoped IAM — dynamic group and policy scoped to nodes tagged with the cluster name, not every instance in the compartment
  • Idempotent cloud-init — all kubectl operations use apply; re-provisioning is safe
  • OCI Vault (enable_vault = true) — cluster secrets in a free software-protected OCI Vault; fetched at boot via instance_principal, not embedded in user-data
  • Boot volume backups (enable_backup = true) — weekly full backups, 1-week retention, within the 5-backup Always Free limit
  • Object Storage state bucket (enable_object_storage_state = true) — versioned OCI Object Storage for Terraform state; S3-compatible endpoint in terraform_state_backend output
  • OCI Notifications + Alertmanager (enable_notifications = false) — opt-in OCI Notifications topic wired to Alertmanager as a webhook receiver
  • MySQL HeatWave (enable_mysql = false) — opt-in Always Free MySQL DB in the private subnet; credentials pre-created as a Kubernetes Secret
  • External DNS (enable_external_dns = false) — automatic Cloudflare DNS record management from HTTPRoute hostnames
  • External Secrets (enable_external_secrets = false) — sync OCI Vault secrets into Kubernetes Secrets via instance_principal; no credentials to rotate

Architecture

graph TD
    Internet(["🌐 Internet"])

    subgraph public["Public Subnet · 10.0.0.0/24"]
        NLB["🔀 Public NLB (Always Free)\nHTTP :80 · HTTPS :443\noptional: kubeapi :6443"]
    end

    subgraph private["Private Subnet · 10.0.1.0/24 · no public IPs"]
        ILB["⚖️ Internal Flex LB (Always Free)\nkubeapi VIP :6443"]

        subgraph cp["Control Plane × 3  ·  A1.Flex (1 OCPU / 6 GB each)\nk3s-server · etcd · Envoy Gateway · Longhorn · user workloads"]
            CP0["control-plane-0"]
            CP1["control-plane-1"]
            CP2["control-plane-2"]
        end

        W["worker-0  ·  A1.Flex (1 OCPU / 6 GB)\nk3s-agent · Envoy Gateway · Longhorn · user workloads"]
    end

    NAT["🌍 NAT Gateway (Always Free)"]
    Bastion["🔐 OCI Bastion Service\noptional · Always Free"]

    Internet -->|HTTP / HTTPS| NLB
    NLB -->|"Envoy Gateway NodePorts :30080 / :30443"| CP0 & CP1 & CP2 & W
    NLB -. "kubeapi :6443\nexpose_kubeapi=true" .-> ILB
    ILB --> CP0 & CP1 & CP2
    W -->|joins via kubeapi| ILB
    private -->|outbound| NAT --> Internet
    Bastion -. "SSH tunnel\nenable_bastion=true" .-> private
Loading

All four A1.Flex instances live in a private subnet with no public IPs. Internet traffic enters exclusively through two Always Free load balancers.

k3s naming note: k3s calls control-plane nodes "servers" (k3s server) and workers "agents" (k3s agent). Terraform resources follow k3s conventions (server/worker); in standard Kubernetes terminology these map to control-plane and worker nodes.

Public NLB forwards HTTP/HTTPS directly to Envoy Gateway NodePorts on all four nodes. is_preserve_source = true preserves real client IPs at the hypervisor level. The NLB optionally exposes the Kubernetes API on port 6443, restricted to your IP.

Internal Flex LB provides a stable private VIP across all three control-plane nodes. Workers join via this VIP so the cluster survives any single control-plane loss.

Longhorn runs on all four nodes with defaultReplicaCount=3 — each PVC is replicated across three nodes. Control-plane NoSchedule taints are removed after cluster init so user workloads schedule across all four identically-sized nodes.

HA ceiling: etcd runs on the 3 control-plane nodes (quorum = 2). The cluster tolerates 1 control-plane failure — the hard limit of a 4-node Always Free topology.

Always Free budget

Resource Free allowance This module
A1.Flex compute 4 OCPUs / 24 GB / 4 instances 3 servers + 1 worker = 4 OCPUs / 24 GB
Block storage 200 GB 4 × 50 GB = 200 GB
Network Load Balancer 1 NLB 1 (public, HTTP/HTTPS)
Flexible Load Balancer 2 × 10 Mbps 1 (private, kubeapi)
E2.1.Micro instances 2 0 (bastion uses OCI Bastion Service — managed, no VM)
NAT Gateway 1 per VCN 1 (outbound-only for private nodes)
Object Storage 20 GB 2 versioned buckets — Terraform state + Longhorn PVC backups (enable_object_storage_state, enable_longhorn_backup)
Vault (shared) Software keys + 150 secrets 3 secrets — k3s_token, longhorn_ui_password, grafana_admin_password (enable_vault = true)
Volume backups 5 total 4 — one per node, weekly, 1-week retention (enable_backup = true)
Notifications 1M HTTPS + 3K email/month 1 topic wired to Alertmanager (enable_notifications = false, opt-in)
MySQL HeatWave 1 standalone DB, 50 GB 1 DB system in private subnet (enable_mysql = false, opt-in)

⚠️ Idle reclamation : OCI reclaims Always Free instances where CPU, network, and memory stay below 20% for 7 consecutive days. The full stack (Longhorn, ArgoCD, cert-manager, kured) generates enough background activity to keep the cluster alive.

Why this topology

With a hard cap of 4 A1.Flex instances, the binding constraint is etcd quorum: HA etcd needs at minimum 3 nodes (quorum = ⌊n/2⌋+1 = 2). The result is a 3-server HA cluster plus 1 standalone worker that saturates every Always Free resource class with nothing left unused and nothing that costs money.

Topology comparison

Topology etcd HA Nodes for workloads Effective RAM for workloads† Assessment
3 CP + 1 worker (this module) ✅ 1-node fault 4 (taints removed) ~15 GB Optimal — HA etcd, all 4 nodes contribute to workloads
1 CP + 3 workers ❌ CP is total SPOF 4 ~18 GB More capacity but control-plane loss = complete cluster death
2 CP + 2 workers ❌ Invalid 2-node etcd cannot form quorum; worse than 1 node
4 CP + 0 workers ✅ 1-node fault 4 (taints removed) ~12 GB Fewer resources for workloads; more etcd overhead

†etcd + kubeapi consume ~300–500 MB RAM and ~100–200m CPU per control-plane node.

4 × 1 OCPU even split prevents any single etcd node from becoming a hot-spot, creates 4 equal fault domains, and allows workloads to spread evenly.

Why not use the 2 free E2.1.Micro instances as extra workers?

Always Free also includes 2 AMD E2.1.Micro instances. They are not worth adding:

  1. Storage budget exhausted — 4 × 50 GB boot volumes already consume the full 200 GB Always Free block storage allowance; two additional instances would require at least 100 GB more
  2. 1 GB RAM — k3s agent + Longhorn DaemonSet alone consume ~700–800 MB, leaving ~200 MB for user workloads
  3. 1/8 OCPU — negligible compute; adds operational complexity for near-zero workload benefit

Previously rejected alternatives

Alternative Why it was rejected
nginx stream proxy in front of Envoy Gateway Extra latency and complexity; NLB already preserves source IPs directly
OCI Bastion VM (E2.1.Micro) OCI Bastion Service provides managed SSH proxying for free with no VM, no OS to patch, and no boot volume consuming storage budget
Boot volumes < 50 GB OCI hard minimum is 50 GB per shape; 4 × 50 GB = 200 GB exactly exhausts the free block storage allowance
Additional NLB for kubeapi Only 1 NLB is Always Free; the existing NLB conditionally exposes port 6443 via expose_kubeapi = true

Failure tolerance

Component Tolerance What happens on failure
Any single node (any role) ✅ 1 node Workloads reschedule to remaining 3 nodes; Longhorn (3 replicas) keeps storage up; Envoy Gateway DaemonSet keeps ingress up on remaining nodes
2 nodes simultaneously ⚠️ Partial Workloads and ingress continue on 2 surviving nodes; if both failed nodes are control-planes, etcd quorum is lost and the API server stops accepting writes (running pods keep running, no new scheduling)
etcd / control-plane quorum ❌ 2 control-planes Cluster becomes read-only; recovery requires etcd snapshot restore
Worker node ✅ Full With taints removed, workloads reschedule to control-planes; no SPOF
HTTP/HTTPS ingress ✅ 3 node losses Envoy Gateway DaemonSet; NLB health-checks remove unhealthy backends automatically
Kubernetes API ✅ 1 control-plane ILB routes to remaining 2 control-planes
PVC data (Longhorn) ✅ 1 node 3 replicas across 4 nodes; 1 replica lost, 2 remain serving
cert-manager ⚠️ Soft Pod reschedules within minutes; TLS serving unaffected (certs live in Secrets); only new issuance/renewal is paused
ArgoCD ⚠️ Soft GitOps sync pauses until rescheduled; running workloads unaffected
MySQL (if enabled) ❌ None Always Free tier = single OCI-managed instance; no HA failover

Node roles and workload placement

Each A1.Flex instance has identical resources (1 OCPU / 6 GB RAM). The k3s role (server vs agent) affects which system processes run, not how much resource is available for workloads.

What control-plane-0/1/2 worker-0 Scheduling mechanism
etcd k3s built-in; servers only
Kubernetes API server k3s built-in; servers only
Envoy Gateway (ingress) DaemonSet — 1 pod per node
Longhorn (storage daemon) DaemonSet — 1 pod per node
cert-manager Deployment — schedules on any node
ArgoCD Deployment — schedules on any node
kube-prometheus-stack Deployment/StatefulSet — any node
kured DaemonSet — 1 pod per node
User workloads No restrictions — schedules on all 4 nodes

Why control-planes run user workloads: k3s ≥ 1.24 automatically taints control-plane nodes with NoSchedule. This setup removes those taints at cluster init so all 4 identically-sized nodes are available. With only one worker, keeping the taint would make it a single point of failure for all user workloads.

Recommendation: use replicas ≥ 2 with topologySpreadConstraints (see gitops/README.md) to spread pods across nodes and survive any single-node failure.

Quickstart

# 1. Clone and enter the example directory
git clone https://github.com/mbologna/k3s-oci.git
cd k3s-oci/example

# 2. Copy and edit the variables file
cp terraform.tfvars.example terraform.tfvars
$EDITOR terraform.tfvars

# 3. Init and apply (terraform or tofu both work)
terraform init && terraform apply
# tofu init && tofu apply

kubeconfig

After terraform apply, run:

terraform output kubeconfig_hint

This prints the exact steps for your configuration. If enable_bastion = true (recommended), the fastest path is the included helper script:

cd example && ./get-kubeconfig.sh
export KUBECONFIG=~/.kube/k3s-oci.yaml
kubectl get nodes

enable_bastion defaults to true. It uses OCI Bastion Service, a managed SSH proxy with no VM, no boot volume, and no cost. Without it, nodes are only reachable via OCI serial console (terraform output kubeconfig_hint explains all options).

Automatic updates & reboots (unattended-upgrades + kured)

unattended-upgrades applies Ubuntu security patches daily and sets /var/run/reboot-required when a kernel update needs a reboot.

kured watches every node for /var/run/reboot-required and, when found:

  1. Acquires a cluster-wide lock (only one node reboots at a time)
  2. Cordons + drains the node
  3. Reboots
  4. Waits for the node to return and uncordons it

This keeps the cluster fully patched with zero manual intervention and no concurrent downtime.

GitOps — App of Apps

The gitops/ directory contains ArgoCD Application manifests managed with the App of Apps pattern.

After the cluster is running, bootstrap it:

kubectl apply -n argocd -f gitops/apps/app-of-apps.yaml

ArgoCD will then continuously reconcile every manifest under gitops/apps/.

Adding your own applications

This repo is designed to be forked. To add your own apps on top of the built-in stack:

  1. Fork this repo on GitHub.

  2. Update all repoURL references to point to your fork:

    bash gitops/update-repo-url.sh https://github.com/your-org/your-fork.git
    git add gitops/apps/ && git commit -m "chore: update gitops repoURL"
    git push
  3. Add your ArgoCD Application manifests to gitops/apps/ — ArgoCD syncs them automatically. Each app can point at any Helm chart registry or any Git repository.

Deploying for the first time? Also set gitops_repo_url in terraform.tfvars before running tofu apply, so cloud-init writes the correct fork URL at bootstrap:

gitops_repo_url = "https://github.com/your-org/your-fork.git"

Already have a running cluster? Patch the App of Apps directly:

argocd app set app-of-apps --repo https://github.com/your-org/your-fork.git

Private repos: configure ArgoCD repository credentials (argocd repo add) before adding manifests that pull from private repositories.

Deploying a web application

Why TLS is terminated at Envoy Gateway, not at the OCI load balancer

OCI provides two load balancer products with very different capabilities:

OCI Network Load Balancer (NLB) OCI Flexible Load Balancer
OSI layer L4 — TCP passthrough L7 — HTTP/HTTPS aware
TLS termination ❌ Not possible ✅ Yes
Always Free 1 NLB 2 × 10 Mbps
Used here nlb.tf — public internet traffic lb.tf — internal kubeapi HA VIP

The public-facing load balancer is the NLB. It forwards raw TCP streams with protocol = "TCP" — it has no knowledge of TLS, HTTP headers, or certificates. TLS must be terminated by something behind it.

The Flexible LB could terminate TLS, but the one free allocation is already consumed by the kubeapi HA load balancer. Even if it were available, using OCI to manage certificates would break the automatic cert-manager + Let's Encrypt renewal cycle.

The current flow is: Internet → NLB (TCP passthrough, preserves client IPs) → Envoy Gateway NodePort → TLS terminate → route to app pod.

Minimal example: HTTP-only

No domain needed. Requests to the NLB IP are served directly.

# hello-web.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-web
  namespace: hello-web
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hello-web
  template:
    metadata:
      labels:
        app: hello-web
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: hello-web
      containers:
        - name: hello-web
          image: httpd:alpine
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: hello-web
  namespace: hello-web
spec:
  selector:
    app: hello-web
  ports:
    - port: 80
      targetPort: 80
---
# HTTPRoute — no hostname filter = matches all requests on the http listener
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: hello-web
  namespace: hello-web
spec:
  parentRefs:
    - name: eg
      namespace: envoy-gateway-system
      sectionName: http
  rules:
    - backendRefs:
        - name: hello-web
          port: 80
kubectl create namespace hello-web
kubectl apply -f hello-web.yaml
NLB_IP=$(cd example && tofu output -raw nlb_ip)
curl http://$NLB_IP/

Minimal example: HTTPS with sslip.io (no domain purchase required)

sslip.io is a public DNS service that resolves <anything>.<ip>.sslip.io directly to <ip>. Combined with cert-manager + Let's Encrypt HTTP-01, this gives a trusted TLS certificate with zero infrastructure cost.

Replace <NLB_IP> with the value of tofu output -raw nlb_ip.

# hello-web-tls.yaml
---
# 1. Certificate — cert-manager issues this via HTTP-01 challenge through Envoy Gateway
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: hello-web-tls
  namespace: envoy-gateway-system   # must be in the same namespace as the Gateway
spec:
  secretName: hello-web-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - hello-web.<NLB_IP>.sslip.io
---
# 2. HTTPS listener on the Gateway (add this to gitops/gateway/gateway.yaml for GitOps management)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
  namespace: envoy-gateway-system
spec:
  gatewayClassName: eg
  listeners:
    - name: http
      port: 80
      protocol: HTTP
      allowedRoutes:
        namespaces:
          from: All
    - name: https-hello-web
      port: 443
      protocol: HTTPS
      hostname: hello-web.<NLB_IP>.sslip.io
      tls:
        mode: Terminate
        certificateRefs:
          - name: hello-web-tls
      allowedRoutes:
        namespaces:
          from: All
---
# 3. HTTP→HTTPS redirect (add hostname to gitops/gateway/redirect.yaml)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: http-to-https-redirect
  namespace: envoy-gateway-system
spec:
  parentRefs:
    - name: eg
      sectionName: http
  hostnames:
    - hello-web.<NLB_IP>.sslip.io
  rules:
    - filters:
        - type: RequestRedirect
          requestRedirect:
            scheme: https
            statusCode: 301
---
# 4. HTTPRoute for the app — attaches to both listeners
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: hello-web
  namespace: hello-web
spec:
  parentRefs:
    - name: eg
      namespace: envoy-gateway-system
      sectionName: https-hello-web
  hostnames:
    - hello-web.<NLB_IP>.sslip.io
  rules:
    - backendRefs:
        - name: hello-web
          port: 80
# Wait for certificate issuance (typically 1–2 minutes)
kubectl wait --for=condition=Ready certificate/hello-web-tls -n envoy-gateway-system --timeout=5m
curl https://hello-web.<NLB_IP>.sslip.io/

With a real domain: set enable_external_dns = true and annotate the HTTPRoute with external-dns.alpha.kubernetes.io/hostname: myapp.example.com. External DNS will create the A record automatically, then cert-manager issues the certificate. Alternatively, set enable_dns01_challenge = true to use DNS-01 (supports wildcard certs and does not require inbound port 80).

Resilience: spread replicas across nodes

Use topologySpreadConstraints to ensure pod replicas land on different nodes:

spec:
  template:
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: <your-app>

With 4 identically-sized nodes, 2 replicas survive any single node failure. Envoy Gateway runs as a DaemonSet with maxUnavailable: 1, so ingress remains up on the other 3 nodes throughout any single-node drain or failure.

Dependency updates (Renovate)

Renovate tracks Terraform providers, k3s, all stack component versions (via # renovate: inline comments in vars.tf and gitops/apps/*.yaml), and GitHub Actions. Enable with the Renovate GitHub App or the self-hosted workflow at .github/workflows/renovate.yml (requires a RENOVATE_TOKEN secret with repo scope).

Remote Terraform state (OCI Object Storage)

With enable_object_storage_state = true (the default), a versioned OCI Object Storage bucket is created automatically. After terraform apply, get the ready-to-use backend config:

terraform output -json terraform_state_backend

Use it in your terraform { backend "s3" {} } block (requires an OCI Customer Secret Key for S3 credentials):

terraform {
  backend "s3" {
    bucket                      = "<cluster_name>-terraform-state"
    key                         = "terraform.tfstate"
    region                      = "<your-region>"                     # e.g. eu-frankfurt-1
    endpoint                    = "https://<namespace>.compat.objectstorage.<region>.oraclecloud.com"
    skip_region_validation      = true
    skip_credentials_validation = true
    skip_metadata_api_check     = true
    force_path_style            = true
  }
}

Generate OCI Customer Secret Keys under Identity → Users → your user → Customer Secret Keys. The bucket name and namespace endpoint are in terraform output terraform_state_backend.

License

MIT. See LICENSE.

Variables

Inputs

Name Description Type Default Required
alertmanager_email Optional email address to subscribe to the OCI Notifications topic. The subscriber must confirm via an OCI confirmation email. string null no
argocd_chart_version ArgoCD Helm chart version used for the bootstrap install. Must match gitops/apps/argocd.yaml targetRevision. Managed by Renovate. string "9.5.14" no
availability_domain Availability domain name, e.g. 'Uocm:EU-FRANKFURT-1-AD-1' string n/a yes
boot_volume_size_in_gbs Boot volume size in GB for k3s nodes (servers + workers). OCI minimum is 50 GB for all shapes. With 4 k3s nodes at 50 GB each the total is 200 GB (exactly at the Always Free limit). The bastion uses OCI Bastion Service — no VM, no boot volume. number 50 no
certmanager_chart_version cert-manager Helm chart version used for the bootstrap install. Must match gitops/apps/cert-manager.yaml targetRevision. Managed by Renovate. string "v1.20.2" no
certmanager_email_address Email address for Let's Encrypt ACME registration. Must be a real address. string n/a yes
cloudflare_api_token Cloudflare API token. Required when enable_external_dns = true or enable_dns01_challenge = true. Create a scoped token at https://dash.cloudflare.com/profile/api-tokens with Zone:DNS:Edit permissions. string null no
cloudflare_zone_id Cloudflare Zone ID for the managed domain. Required when enable_external_dns = true. string null no
cluster_name Logical name for the cluster. Used in display names and freeform tags. string n/a yes
compartment_ocid OCID of the compartment where all resources are created string n/a yes
compute_shape OCI compute shape for k3s nodes string "VM.Standard.A1.Flex" no
dockerhub_password Docker Hub access token (PAT) for ArgoCD OCI Helm chart pulls. Paired with dockerhub_username. string "" no
dockerhub_username Docker Hub username for ArgoCD to authenticate when pulling OCI Helm charts (e.g. Envoy Gateway from registry-1.docker.io). If empty, anonymous pulls are attempted and may be rate-limited. Create a PAT at https://app.docker.com/settings/personal-access-tokens string "" no
enable_backup Enable weekly boot volume backups for all k3s nodes (Always Free: 5 total backups). With 4 nodes at weekly-1-week-retention there are at most 4 active backups. bool true no
enable_bastion Provision an OCI Bastion Service resource (managed SSH proxy, Always Free, no storage).
When enabled, a STANDARD bastion is created and associated with the private subnet.
Use example/get-kubeconfig.sh to retrieve kubeconfig via a Bastion session.
Strongly recommended; without it, nodes are reachable only via serial console.
bool true no
enable_dns01_challenge Configure cert-manager ClusterIssuers to use DNS-01 ACME challenge via Cloudflare instead of HTTP-01. Enables wildcard certificates (*.example.com) and works even without inbound port 80. Requires cloudflare_api_token. bool false no
enable_external_dns Deploy external-dns (kubernetes-sigs) configured for Cloudflare. Automatically creates/updates DNS A records when Services or Ingresses are annotated. Requires cloudflare_api_token and cloudflare_zone_id. bool false no
enable_external_secrets Deploy the External Secrets Operator and create a ClusterSecretStore backed by OCI Vault (instance_principal auth). Requires enable_vault = true. Workloads can then create ExternalSecret resources to sync any OCI Vault secret into a Kubernetes Secret without hard-coding values. bool false no
enable_longhorn_backup Provision a dedicated Always Free OCI Object Storage bucket for Longhorn PVC backups (S3-compatible). See longhorn_backup_setup output for connection instructions. Shares the 20 GB free allowance with the Terraform state bucket. bool true no
enable_mysql Provision an Always Free MySQL HeatWave DB system (single node, 50 GB). Creates a Kubernetes Secret 'mysql-credentials' in the default namespace. bool false no
enable_notifications Create an OCI Notifications topic and wire it to Alertmanager as a webhook receiver (Always Free: 1M HTTPS + 3K email/month). bool false no
enable_object_storage_state Provision an Always Free OCI Object Storage bucket for storing Terraform/OpenTofu state (S3-compatible API). See the terraform_state_backend output for the backend configuration snippet. bool true no
enable_oci_logging Enable OCI Logging for cloud-init logs. Ships /var/log/k3s-cloud-init.log to OCI Logging Service via the Unified Monitoring Agent (Always Free: 10 GB/month). bool true no
enable_vault Store cluster secrets (k3s_token, longhorn_ui_password, grafana_admin_password) in OCI Vault (Always Free: software keys + 150 secrets). Nodes fetch secrets via OCI CLI instance_principal at boot — plaintext values are removed from cloud-init user-data. bool true no
environment Deployment environment label (e.g. staging, production) string "staging" no
expose_kubeapi Expose the Kubernetes API server via the public NLB (restricted to my_public_ip_cidr) bool false no
expose_ssh Expose SSH (port 22) via the public NLB to all cluster nodes (restricted to my_public_ip_cidr). Eliminates the need for OCI Bastion sessions for day-to-day access. bool false no
external_dns_domain_filter Domain filter for external-dns — only DNS records under this domain are managed (e.g. 'k3s.example.com'). Required when enable_external_dns = true. string null no
external_secrets_chart_version External Secrets Operator Helm chart version used for the bootstrap install. Must match gitops/apps/external-secrets.yaml targetRevision. Managed by Renovate. string "2.4.1" no
fault_domains Fault domains to spread the instance pool across list(string)
[
"FAULT-DOMAIN-1",
"FAULT-DOMAIN-2",
"FAULT-DOMAIN-3"
]
no
gateway_api_version Kubernetes Gateway API CRDs version (experimental channel) installed at bootstrap. Experimental channel is a superset of standard and includes GRPCRoute, TCPRoute, TLSRoute, etc. required by Envoy Gateway. Must exist before ArgoCD syncs gateway-config. string "v1.5.1" no
github_ssh_keys_username GitHub username whose published SSH keys (https://github.com/.keys)
are added to every instance's authorized_keys at plan time, in addition to
the primary public_key / public_key_path. Leave empty to skip.
string "" no
gitops_repo_url Git repository URL for the ArgoCD App of Apps (e.g. https://github.com/your-org/k3s-oci.git). Set this to your fork so ArgoCD pulls from the right repo. string "https://github.com/mbologna/k3s-oci.git" no
grafana_hostname Fully-qualified hostname for the Grafana UI (e.g. grafana.example.com). When set, a Gateway API HTTPRoute with a cert-manager TLS certificate is created in gitops/monitoring/. string null no
http_lb_port Public HTTP port on the NLB frontend (default 80). number 80 no
https_lb_port Public HTTPS port on the NLB frontend (default 443). number 443 no
ingress_controller_http_nodeport NodePort on workers that the ingress controller binds for HTTP traffic number 30080 no
ingress_controller_https_nodeport NodePort on workers that the ingress controller binds for HTTPS traffic number 30443 no
k3s_server_pool_size Number of k3s control-plane nodes in the instance pool. Use 3 for HA (etcd quorum). Must be an odd number >= 1. number 3 no
k3s_standalone_worker When true (default), provisions one worker node as a plain oci_core_instance resource.
This is the recommended approach for OCI Always Free tenancies: instance pools route
requests through OCI Capacity Management which can fail for A1.Flex shapes, whereas
a direct oci_core_instance reliably claims the free allocation.
Default topology: 3 control-plane nodes (pool) + 1 standalone worker = 4 OCPUs / 24 GB.
bool true no
k3s_subnet Subnet name used to derive the flannel interface. Leave 'default_route_table' to let k3s auto-detect. string "default_route_table" no
k3s_version k3s version to install. 'latest' resolves the current stable release at plan time via the GitHub API. string "latest" no
k3s_worker_pool_size Number of k3s worker nodes managed by the OCI Instance Pool.
Set to 0 (default) when using k3s_standalone_worker = true, which is the recommended
Always Free topology. The pool is kept to allow future scaling beyond the free tier.
number 0 no
kube_api_port Port the k3s API server listens on number 6443 no
longhorn_hostname Fully-qualified hostname for the Longhorn UI (e.g. longhorn.example.com). When set, a Gateway API HTTPRoute with BasicAuth (Envoy Gateway SecurityPolicy) and a cert-manager TLS certificate is created. string null no
longhorn_ui_username Username for Longhorn UI BasicAuth (only used when longhorn_hostname is set). string "admin" no
my_public_ip_cidr Your workstation public IP in CIDR notation (e.g. 1.2.3.4/32).
Restricts OCI Bastion Service session creation (enable_bastion = true) and
kubeapi access via the public NLB (expose_kubeapi = true).
k3s nodes are in a private subnet and are only reachable via OCI Bastion sessions.
string n/a yes
mysql_admin_username Admin username for the MySQL HeatWave DB system. string "admin" no
mysql_shape MySQL HeatWave shape. 'MySQL.Free' is the Always Free shape. string "MySQL.Free" no
oci_core_vcn_cidr CIDR block for the VCN string "10.0.0.0/16" no
oci_core_vcn_dns_label n/a string "k3svcn" no
oci_identity_dynamic_group_name Name for the OCI dynamic group granting instances access to the OCI API string "k3s-cluster-dynamic-group" no
oci_identity_policy_name Name for the OCI IAM policy attached to the dynamic group string "k3s-cluster-policy" no
os_image_id OCID of the Ubuntu 24.04 LTS (Noble) aarch64 image for A1.Flex nodes. If null, the latest matching image is resolved automatically from the tenancy. Find OCIDs at https://docs.oracle.com/en-us/iaas/images/ string null no
private_subnet_cidr CIDR for the private subnet (k3s nodes) string "10.0.1.0/24" no
private_subnet_dns_label n/a string "k3sprivate" no
public_key SSH public key content placed on every instance. Preferred over public_key_path —
pass the key string directly for CI pipelines where ~/.ssh does not exist.
When null, the key is read from public_key_path at plan time.
string null no
public_key_path Path to SSH public key file. Used as fallback when public_key is null. string "~/.ssh/id_ed25519.pub" no
public_subnet_cidr CIDR for the public subnet (load balancers and optional bastion) string "10.0.0.0/24" no
public_subnet_dns_label n/a string "k3spublic" no
region OCI region identifier (e.g. 'eu-frankfurt-1'). Required when enable_external_secrets = true for the ClusterSecretStore to locate the OCI Vault endpoint. string null no
server_memory_in_gbs RAM in GB per control-plane node. Total RAM must not exceed 24 GB (Always Free). number 6 no
server_ocpus OCPUs per control-plane node. Total OCPUs across all nodes must not exceed 4 (Always Free). number 1 no
tenancy_ocid OCID of the tenancy string n/a yes
unique_tag_key Freeform tag key applied to every resource for identification string "k3s-provisioner" no
unique_tag_value Freeform tag value applied to every resource for identification string "https://github.com/mbologna/k3s-oci" no
worker_memory_in_gbs RAM in GB per worker node. number 6 no
worker_ocpus OCPUs per worker node. number 1 no

Outputs

Name Description
argocd_initial_password_hint Command to retrieve the ArgoCD initial admin password (run after cluster is up)
bastion_ocid OCID of the OCI Bastion Service resource (null if enable_bastion = false). Use with example/get-kubeconfig.sh or oci bastion session create-managed-ssh.
grafana_admin_credentials Grafana admin credentials (only available after cluster bootstrap)
internal_lb_ip Private IP of the internal load balancer (used by agents to join the cluster)
k3s_servers_private_ips Private IPs of k3s control-plane nodes
k3s_standalone_worker_private_ip Private IP of the standalone worker node (oci_core_instance, not pool-managed)
k3s_token k3s cluster join token (sensitive)
k3s_workers_private_ips Private IPs of k3s worker nodes (instance pool)
kubeconfig_hint How to retrieve kubeconfig after cluster is up
longhorn_backup_setup Instructions to connect Longhorn to the OCI Object Storage backup bucket. Null if enable_longhorn_backup = false.
longhorn_ui_credentials Longhorn UI credentials (only set when longhorn_hostname is configured)
mysql_admin_credentials MySQL HeatWave admin credentials (sensitive). Null if enable_mysql = false.
mysql_endpoint MySQL HeatWave connection endpoint (hostname:port). Null if enable_mysql = false.
notification_topic_endpoint OCI Notifications HTTPS endpoint for the Alertmanager webhook receiver (null if enable_notifications = false).
oci_log_group_id OCI Log Group OCID for k3s cloud-init logs (null if enable_oci_logging = false)
public_nlb_ip Public IP address of the NLB (point your DNS here)
ssh_command SSH command to connect to a cluster node via the public NLB (null if expose_ssh = false). Routes to any available server.
terraform_state_backend S3-compatible backend config snippet for storing Terraform state in the provisioned OCI Object Storage bucket. Replace and add S3 credentials (OCI Customer Secret Key).
vault_id OCI Vault OCID (null if enable_vault = false)

About

Production-ready 3-node k3s HA cluster on OCI Always Free tier with cert-manager, Longhorn, ArgoCD, and Envoy Gateway

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors