terraform-infra-reconcilers

"Reconciling the world not as it is, but as it should be." — Matt Moore, CTO

Terraform modules for deploying reconciliation systems on Google Cloud Platform using the DriftlessAF framework.

What is a Reconciler?

A reconciler implements a continuous feedback loop that makes systems self-healing and resilient. Adapted from Kubernetes controllers, the pattern is deceptively simple:

Watch: Observe the current state of the world
Compare: Compute the delta between desired and actual state
Act: Make changes to close the gap
Repeat: Forever

Level-Based vs Event-Driven

Traditional event-driven systems are fragile: lost events mean lost actions, crashed services leave work incomplete, and state drifts over time. They're edge-triggered, reacting to moments of change without remembering the desired end state.

Reconciliation systems are level-based. The workqueue holds keys, not events—multiple events about the same resource collapse into a single reconciliation of its current state. Events are just hints to check state; the reconciler compares actual to desired and takes idempotent actions that are safe to repeat. This makes the system naturally resilient to event storms, duplicate notifications, and processing delays.

Architecture Overview

                                    ┌────────────────────────────────────────────────────────┐
                                    │                    WORKQUEUE                           │
┌──────────────┐                    │  ┌─────────────┐    ┌─────────┐    ┌──────────────┐    │
│   Triggers   │                    │  │  Receiver   │───▶│   GCS   │───▶│  Dispatcher  │    │
│              │   enqueue keys     │  │  (Cloud Run)│    │ Bucket  │    │  (Cloud Run) │    │
│ • Cron Jobs  │───────────────────▶│  └─────────────┘    └─────────┘    └──────┬───────┘    │
│ • CloudEvents│                    │                                           │            │
│ • GitHub     │                    └───────────────────────────────────────────┼────────────┘
│   Webhooks   │                                                                │
└──────────────┘                                                                │ dispatch
                                                                                ▼
                                                                    ┌────────────────────┐
                                                                    │     Reconciler     │
                                                                    │    (Cloud Run)     │
                                                                    │                    │
                                                                    │  Your Go code that │
                                                                    │  processes each key│
                                                                    └────────────────────┘

How It Works

Triggers (cron jobs, CloudEvents, GitHub webhooks) enqueue keys to the workqueue
Receiver accepts keys via gRPC and stores them in a GCS bucket
Dispatcher polls the bucket, dispatches keys to your reconciler with concurrency control
Reconciler (your code) processes each key and can requeue with delays on failure
Dead Letter Queue captures keys that exceed retry limits for manual inspection

Key Properties

Deduplication by key: Multiple events for the same resource (e.g., PR URL) collapse into one reconciliation
Key exclusion guarantee: The same key is never processed concurrently across instances
Idempotent actions: Reconcilers are safe to run repeatedly on the same key
Automatic retry: Failed items are requeued with exponential backoff

Reconciliation Patterns

This module supports two primary reconciliation patterns:

Pattern	Key Type	Trigger	Use Cases
PR Reconciler	PR URL (`github.com/org/repo/pull/123`)	CloudEvents from GitHub webhooks	PR validation, policy checks, automated fixes
Path-Based Reconciler	File path (`github.com/org/repo/blob/main/config.yaml`)	Push events + periodic resync	Config sync, GitOps, file monitoring

Modules

Module	Description	Pattern
`regional-go-reconciler`	Complete reconciler with workqueue + regional Go service	PR Reconciler
`cloudevents-workqueue`	Bridge CloudEvents to workqueue based on event extensions	PR Reconciler
`github-path-reconciler`	File path reconciliation with push events + periodic resync	Path-Based
`workqueue`	Core workqueue infrastructure (receiver, dispatcher, GCS bucket)	Both
`dashboard/workqueue`	Cloud Monitoring dashboard for workqueue metrics	Both
`dashboard/reconciler`	Cloud Monitoring dashboard for reconciler services	Both

Getting Started

Prerequisites

GCP project with Cloud Run, Cloud Storage, and Pub/Sub APIs enabled
Terraform >= 1.0
Go 1.21+ for writing reconciler code

Quick Start: Regional Go Reconciler

The regional-go-reconciler module is the easiest way to deploy a complete reconciler:

module "my-reconciler" {
  source  = "driftlessaf/reconcilers/infra//modules/regional-go-reconciler"
  version = "~> 1.0"

  project_id = var.project_id
  name       = "my-reconciler"
  regions    = var.regions
  team       = "platform"

  service_account = google_service_account.reconciler.email

  # Build reconciler from source using ko
  containers = {
    "reconciler" = {
      source = {
        working_dir = path.module
        importpath  = "./cmd/reconciler"
      }
      ports = [{ container_port = 8080 }]
    }
  }

  # Workqueue configuration
  concurrent-work = 20  # Process 20 keys concurrently
  max-retry       = 100 # Move to DLQ after 100 failures

  notification_channels = var.notification_channels
}

Implementing the Reconciler (Go)

Your reconciler implements the workqueue gRPC service. The key principle is idempotent, level-based reconciliation:

package main

import (
    "context"
    "log"
    "net"
    "os"

    "github.com/driftlessaf/go-driftlessaf/workqueue"
    "github.com/driftlessaf/go-driftlessaf/reconcilers/githubreconciler"
    "google.golang.org/grpc"
)

type Reconciler struct {
    workqueue.UnimplementedWorkqueueServiceServer
    statusManager *statusmanager.StatusManager
}

func (r *Reconciler) Process(ctx context.Context, req *workqueue.ProcessRequest) (*workqueue.ProcessResponse, error) {
    // 1. Parse the key (e.g., PR URL)
    res, err := githubreconciler.ParseResource(req.Key)
    if err != nil {
        return nil, err
    }

    // 2. Fetch current state (cheap API call)
    pr, _, err := gh.PullRequests.Get(ctx, res.Owner, res.Repo, res.Number)
    if err != nil {
        return nil, err
    }
    sha := pr.GetHead().GetSHA()

    // 3. Check observed generation (idempotency)
    session := r.statusManager.NewSession(gh, res, sha)
    status, _ := session.ObservedState(ctx)
    if status != nil && status.Status == "completed" {
        return &workqueue.ProcessResponse{}, nil  // Already processed this SHA
    }

    // 4. Only now fetch expensive data (diff, file contents, etc.)
    // 5. Compute desired state
    // 6. Take action to align actual with desired

    return &workqueue.ProcessResponse{}, nil
}

func main() {
    lis, _ := net.Listen("tcp", ":"+os.Getenv("PORT"))
    srv := grpc.NewServer()
    workqueue.RegisterWorkqueueServiceServer(srv, &Reconciler{})
    log.Fatal(srv.Serve(lis))
}

Key properties:

Idempotent: Running twice on the same SHA does nothing
Level-based: Checks entire resource state, not just the triggering event
Lazy evaluation: Expensive operations only when reconciliation is needed

Enqueuing Work

Services can enqueue work to your reconciler:

client, err := workqueue.NewWorkqueueClient(ctx, os.Getenv("WORKQUEUE_SERVICE"))
if err != nil {
    return err
}
defer client.Close()

// Enqueue a key for processing
_, err = client.Process(ctx, &workqueue.ProcessRequest{
    Key: "resource-id-123",
})

Common Patterns

GitHub Path Reconciler

Reconcile files in a GitHub repository when they change:

module "config-sync" {
  source  = "driftlessaf/reconcilers/infra//modules/github-path-reconciler"
  version = "~> 1.0"

  project_id     = var.project_id
  name           = "config-sync"
  regions        = var.regions
  primary-region = "us-central1"
  team           = "platform"

  # Repository to watch
  github_owner      = "my-org"
  github_repo       = "config-repo"
  octo_sts_identity = "config-sync"

  # Match YAML files in the configs directory
  path_patterns = ["(configs/.+\\.yaml)"]

  # Resync all files every 6 hours
  resync_period_hours = 6

  # CloudEvents broker for push notifications
  broker = var.github_events_broker

  service_account       = google_service_account.reconciler.email
  containers            = { /* ... */ }
  notification_channels = var.notification_channels
}

CloudEvents to Workqueue

Process GitHub pull requests via CloudEvents:

module "pr-processor" {
  source  = "driftlessaf/reconcilers/infra//modules/cloudevents-workqueue"
  version = "~> 1.0"

  project_id = var.project_id
  name       = "pr-processor"
  regions    = var.regions
  team       = "platform"

  broker = module.cloudevent-broker.broker

  # Subscribe to PR-related events
  filters = [
    { "type" = "dev.chainguard.github.pull_request" },
    { "type" = "dev.chainguard.github.pull_request_review" },
  ]

  # Use PR URL as workqueue key (deduplicates concurrent events)
  extension_key = "pullrequesturl"

  workqueue = {
    name = module.workqueue.receiver.name
  }

  notification_channels = var.notification_channels
}

Standalone Workqueue

For custom integrations, use the workqueue module directly:

module "workqueue" {
  source  = "driftlessaf/reconcilers/infra//modules/workqueue"
  version = "~> 1.0"

  project_id = var.project_id
  name       = "my-workqueue"
  regions    = var.regions
  team       = "platform"

  concurrent-work = 10
  max-retry       = 5
  scope           = "global"  # Single multi-regional queue

  reconciler-service = {
    name = module.my-service.name
  }

  notification_channels = var.notification_channels
}

Key Concepts

Generation vs Observed Generation

Borrowed from Kubernetes, this concept enables idempotent reconciliation:

Concept	Kubernetes	GitHub Reconcilers
Generation	`metadata.generation`	HEAD commit SHA
Observed Generation	`status.observedGeneration`	SHA recorded in Check Run
Change trigger	Spec change	New commit pushed

If generations match, the status already reflects your reconciliation of the current state—skip reprocessing. This ensures no-op reconciliations are fast and cheap.

Workqueue Scope

global (recommended): Single multi-regional GCS bucket provides accurate deduplication and concurrency control across all regions
regional: Per-region buckets offer lower latency but cannot prevent concurrent processing of the same key across regions

Dead Letter Queue

Keys that fail more than max-retry times are moved to a dead letter queue (dead-letter/ prefix in the GCS bucket). To reprocess after fixing the underlying issue:

gcloud run jobs execute <workqueue-name>-reenqueue --region <region>

Priority

Workqueue items support priority levels. Higher values are processed first. Use this to prioritize event-driven work (priority 100) over periodic resyncs (priority 0).

Reconciler Efficiency

A well-designed reconciler should be exceptionally cheap when nothing needs to change. Use lazy evaluation: fetch only enough state to determine if work is needed, then fetch additional data only if reconciliation is actually required.

Monitoring

Deploy dashboards for observability:

module "workqueue-dashboard" {
  source  = "driftlessaf/reconcilers/infra//modules/dashboard/workqueue"
  version = "~> 1.0"

  name            = "my-reconciler"
  max_retry       = 100
  concurrent_work = 20
  scope           = "global"
}

Dashboard includes:

Queue depth (work in progress, queued, added)
Processing and wait latency
Retry patterns and completion rates
Dead letter queue monitoring

Go Dependencies

Reconciler code imports from github.com/driftlessaf/go-driftlessaf:

import (
    "github.com/driftlessaf/go-driftlessaf/workqueue"
    "github.com/driftlessaf/go-driftlessaf/reconcilers/githubreconciler"
)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.chainguard		.chainguard
.github/chainguard		.github/chainguard
modules		modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

terraform-infra-reconcilers

What is a Reconciler?

Level-Based vs Event-Driven

Architecture Overview

How It Works

Key Properties

Reconciliation Patterns

Modules

Getting Started

Prerequisites

Quick Start: Regional Go Reconciler

Implementing the Reconciler (Go)

Enqueuing Work

Common Patterns

GitHub Path Reconciler

CloudEvents to Workqueue

Standalone Workqueue

Key Concepts

Generation vs Observed Generation

Workqueue Scope

Dead Letter Queue

Priority

Reconciler Efficiency

Monitoring

Go Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

terraform-infra-reconcilers

What is a Reconciler?

Level-Based vs Event-Driven

Architecture Overview

How It Works

Key Properties

Reconciliation Patterns

Modules

Getting Started

Prerequisites

Quick Start: Regional Go Reconciler

Implementing the Reconciler (Go)

Enqueuing Work

Common Patterns

GitHub Path Reconciler

CloudEvents to Workqueue

Standalone Workqueue

Key Concepts

Generation vs Observed Generation

Workqueue Scope

Dead Letter Queue

Priority

Reconciler Efficiency

Monitoring

Go Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages