Skip to content

Latest commit

 

History

History
349 lines (285 loc) · 9.22 KB

File metadata and controls

349 lines (285 loc) · 9.22 KB

Terraform Module Optimization - Summary of Changes

Overview

This document summarizes the comprehensive optimization and modernization of the Kubernetes deployment Terraform module.

Major Changes

1. Provider Version Updates

  • Kubernetes Provider: Updated from ~> 2.0 to ~> 2.35
  • Google Provider: Updated from ~> 6.0 to ~> 7.0 (major version upgrade)
  • Workload Identity Module: Updated from unversioned to ~> 41.0
  • Terraform Version: Now requires >= 1.3 (for better optional attribute support)

Google Provider v7 Changes:

  • Major version upgrade with improved resource handling
  • No breaking changes for this module (only uses workload-identity submodule)
  • Better error messages and validation
  • Performance improvements
  • Full backwards compatibility maintained

2. Removed Company-Specific Hardcoded Values

Datadog/Observability Configuration

Before:

env {
  name  = "DD_SERVICE"
  value = "ddm-platform-${var.application_name}"
}

After:

observability_config = {
  agent_host_env_vars = ["DD_AGENT_HOST", "DD_TRACE_AGENT_HOSTNAME"]
  service_name_prefix = ""  # Configurable, no hardcoded prefix
  service_env_var     = "DD_SERVICE"
  version_env_var     = "DD_VERSION"
}

Benefits:

  • Works with any APM tool (Datadog, OpenTelemetry, New Relic, etc.)
  • Completely optional - can be disabled by setting to null
  • Prefix is configurable instead of hardcoded
  • No longer assumes Datadog is always needed

Node Pool Affinity

Before:

variable "node_pool" {
  default = "standard4"  # Company-specific default
}

After:

variable "node_pool" {
  default = null  # No default affinity
}

Benefits:

  • Works with any Kubernetes cluster
  • No assumptions about node pool names
  • Affinity only applied when explicitly configured

3. Made GCP-Specific Features Optional

Workload Identity

Before:

  • Required variables: project, gke_cluster_name, service_account_name
  • Always created GCP service accounts and IAM bindings

After:

variable "project" {
  default = null  # Optional
}

variable "gke_cluster_name" {
  default = null  # Optional
}

locals {
  enable_workload_identity = var.project != null && var.gke_cluster_name != null
}

Benefits:

  • Works with any Kubernetes cluster (EKS, AKS, on-prem, etc.)
  • GCP features only activate when GCP variables are provided
  • Falls back to basic Kubernetes service account when not using GKE

IAM Roles

Before:

variable "roles" {
  default = ["roles/secretmanager.secretAccessor"]  # GCP-specific default
}

After:

variable "roles" {
  default = []  # No default roles
}

Benefits:

  • No assumptions about required permissions
  • Users explicitly declare needed roles
  • Cleaner for non-GCP users

NEG Annotations

Before:

annotations = {
  "cloud.google.com/neg" = "{\"ingress\": true}"  # Always applied
}

After:

variable "enable_neg_annotation" {
  default = false
}

annotations = var.enable_neg_annotation ? {
  "cloud.google.com/neg" = "{\"ingress\": true}"
} : {}

Benefits:

  • Only applied when explicitly enabled
  • Doesn't break non-GKE deployments
  • Clear opt-in behavior

4. Fixed Bugs

Topology Spread Label Selector

Before (Bug):

match_labels = {
  "app.kubernetes.io/instance" = "var.application_name"  # String literal, not variable
}

After (Fixed):

match_labels = {
  app = var.application_name  # Actual variable reference
}

Host Aliases

Before (Bug):

host_aliases {
  hostnames = var.host_alias.hostnames  # Would fail if var.host_alias is null
  ip        = var.host_alias.ip
}

After (Fixed):

dynamic "host_aliases" {
  for_each = var.host_alias != null ? [var.host_alias] : []
  content {
    hostnames = host_aliases.value.hostnames
    ip        = host_aliases.value.ip
  }
}

5. Service Account Name Computed Default

Before:

variable "service_account_name" {
  type = string  # Required
}

After:

variable "service_account_name" {
  type    = string
  default = null
}

locals {
  service_account_name = var.service_account_name != null ? var.service_account_name : "${var.application_name}-sa"
}

Benefits:

  • One less required variable
  • Sensible computed default
  • Still overridable when needed

6. Conditional Resource Creation

Added conditional creation logic throughout:

module "deployment_workload_identity" {
  count = local.enable_workload_identity ? 1 : 0
  # ...
}

resource "kubernetes_service_account" "basic" {
  count = local.enable_workload_identity ? 0 : 1
  # ...
}

Benefits:

  • Resources only created when needed
  • Cleaner state files
  • No unnecessary API calls

7. Submodule Updates

Applied all the same improvements to modules/cronjobs:

  • Provider version updates
  • Optional observability configuration
  • Optional workload identity
  • Computed service account name
  • Conditional resource creation

8. Enhanced GCP/Workload Identity Features

Added new GCP-specific optimization variables:

Existing GCP Service Account Support:

use_existing_gcp_sa   = true
existing_gcp_sa_email = "my-sa@project.iam.gserviceaccount.com"

Benefits:

  • Reuse existing service accounts with pre-configured permissions
  • Useful for shared service accounts across multiple workloads
  • Reduces service account sprawl

Enhanced Security:

automount_service_account_token = false

Benefits:

  • Prevents automatic mounting of service account tokens
  • Enhanced security posture when using workload identity
  • Follows principle of least privilege

Regional/Zonal Configuration:

gke_location = "us-central1"  # or "us-central1-a" for zonal

Benefits:

  • Properly configure workload identity for regional/zonal clusters
  • Better alignment with GKE cluster configuration
  • Required for some GCP APIs

Files Modified

Root Module

  • main.tf - Provider versions, computed locals
  • variables.tf - New optional variables, removed defaults
  • kubernetes_deployment.tf - Dynamic observability, optional node affinity, fixed bugs
  • kubernetes_service.tf - Conditional NEG annotation
  • kubernetes_service_account.tf - Conditional workload identity
  • outputs.tf - Updated for conditional resources
  • README.md - Comprehensive new documentation
  • MIGRATION.md - Detailed migration guide

Cronjobs Submodule

  • modules/cronjobs/main.tf - Provider versions, computed locals
  • modules/cronjobs/variables.tf - New optional variables
  • modules/cronjobs/kubernetes-deployment.tf - Dynamic observability
  • modules/cronjobs/kubernetes-service-account.tf - Conditional workload identity
  • modules/cronjobs/outputs.tf - Updated for conditional resources

Breaking Changes Summary

  1. observability_config - Must be explicitly set to enable Datadog or other APM
  2. node_pool - Now defaults to null instead of "standard4"
  3. roles - Now defaults to [] instead of ["roles/secretmanager.secretAccessor"]
  4. project/gke_cluster_name - Now optional (null by default)
  5. enable_neg_annotation - New variable, defaults to false
  6. node_pool - Change from string to list. This will allow us to use multiple node pools

Backwards Compatibility

To maintain v3.x behavior, users should add these to their module calls:

node_pool = "standard4"
roles     = ["roles/secretmanager.secretAccessor"]
observability_config = {
  agent_host_env_vars = ["DD_AGENT_HOST", "DD_TRACE_AGENT_HOSTNAME"]
  service_name_prefix = "ddm-platform-"
  service_env_var     = "DD_SERVICE"
  version_env_var     = "DD_VERSION"
}
enable_neg_annotation = true

Benefits of This Update

For Your Team

  1. Less Rigid: No longer assumes specific node pools or naming conventions
  2. More Reusable: Can be used across different projects and environments
  3. Better Defaults: Sensible defaults that don't assume company-specific infrastructure
  4. Cleaner Code: Fixed bugs and removed hardcoded values

For Others

  1. Universal: Works with any Kubernetes cluster, not just GKE
  2. Tool Agnostic: Support any APM/observability tool
  3. Cloud Agnostic: GCP features are optional, not required
  4. Best Practices: Follows Terraform module best practices

For Maintenance

  1. Latest Providers: Using current provider versions with latest features
  2. Bug Fixes: Fixed topology spread and host aliases bugs
  3. Better Documentation: Comprehensive README and migration guide
  4. Type Safety: Better use of optional types and computed values

Testing Recommendations

  1. Test in Non-Production First: Validate changes in dev/staging
  2. Review terraform plan: Check for unexpected changes
  3. Test Different Configurations:
    • Basic Kubernetes (no GCP)
    • With Workload Identity
    • With observability
    • With node affinity
  4. Validate Outputs: Ensure outputs match expected values

Next Steps

  1. Update any downstream modules or repos that use this module
  2. Update CI/CD pipelines if needed
  3. Update team documentation
  4. Consider creating example configurations for common use cases
  5. Plan rollout strategy for existing deployments

Questions or Issues

See MIGRATION.md for detailed migration instructions, or open an issue in the repository.