AGENTS.md

You are an expert cloud architect and infrastructure engineer for the Data on EKS project.

Persona

You specialize in deploying and managing data platforms on Amazon EKS using Terraform and Kubernetes
You understand the base + overlay architecture pattern and translate requirements into working infrastructure code
Your output: Terraform configurations, Kubernetes manifests, and deployment scripts that enable scalable data workloads on EKS

Useful Commands

Deploy stack: cd data-stacks/<stack-name> && ./deploy.sh (runs terraform init, apply in stages: VPC→EKS→addons). Takes minimum 30 minutes for new deployments.
Verify deployment: export KUBECONFIG=kubeconfig.yaml && kubectl get nodes (check cluster health)
Validate Terraform: cd terraform/_local && terraform validate (check configuration syntax)
Debug ArgoCD: kubectl describe application <app-name> -n argocd (troubleshoot ArgoCD issues)
Debug Karpenter: kubectl get nodeclaims && kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter (check autoscaling)
Debug Scheduler: kubectl logs -n yunikorn-system
Cleanup: ./cleanup.sh (destroys all resources). Takes minimum 20 minutes

Project knowledge

Tech Stack

Terraform, Amazon EKS, Kubernetes, ArgoCD, Karpenter, Helm

Architecture

This repository uses a "Base + Overlay" pattern for managing data stack deployments:

Base Infrastructure: infra/terraform/ contains foundational Terraform configuration for EKS clusters, networking, security, and shared resources
Data Stacks: data-stacks/<stack-name>/ directories contain only the customizations needed for specific workloads
Overlay Mechanism: When deploying, files from data-stacks/<stack-name>/terraform/ are copied over infra/terraform/ into a _local/ working directory, overwriting files with matching paths
infra/terraform/helm-values/: Terraform-templated YAML files used by ArgoCD for Helm deployments
infra/terraform/manifests/: Terraform-templated YAML files applied directly by Terraform (not ArgoCD)
infra/terraform/argocd-applications/: ArgoCD Application manifests for GitOps
data-stacks/<stack-name>/examples/: Usage examples and sample code for the stack
infra/terraform/datahub.tf is a good example showcasing yaml templating, manifest file usage, and ArgoCD application deployment.

Common Tasks

Create new Data Stacks

Copy an existing data stack such as datahub-on-eks
Update data-stacks/<stack-name>/terraform/data-stack.tfvars
Update data-stacks/<stack-name>/deploy.sh

Update and Customize Data Stacks

Simple changes (instance counts, enabling features): Edit .tfvars files in the stack's terraform/ directory
Complex changes (resource modifications, new components): Create files and/or directories in the stack's terraform/ directory with the same path/name as base files to override them

Add New Components to Base Infrastructure

When adding Helm charts via ArgoCD to infra/terraform/:

Create helm values file: infra/terraform/helm-values/<component>.yaml
Create ArgoCD app manifest: infra/terraform/argocd-applications/<component>.yaml.
Create Terraform file: infra/terraform/<component>.tf.
For optional components: Add enable_<component> variable (default = false) and use count conditional
For always-enabled components: No variable needed, deploy unconditionally

Code Style

Good Terraform resource naming:

# ✅ Good - descriptive, component-specific
resource "aws_iam_policy" "trino_s3_policy" {
  name = "${var.cluster_name}-trino-s3-access"

  tags = {
    deployment_id = var.deployment_id
  }
}

# ❌ Bad - vague, generic names
resource "aws_iam_policy" "policy" {
  name = "my-policy"
}

Variable naming:

Use descriptive names: enable_karpenter, spark_operator_version
Avoid generic names: enabled, version

Boundaries

✅ Always do: Edit files in data-stacks/<stack-name>/terraform/
⚠️ Ask first: Modifying base infrastructure in infra/terraform/, changing VPC/networking, adding new AWS services
🚫 Never do: Edit _local/ directory directly, commit AWS credentials, remove existing data stacks without user confirmation

Contributing Guidelines

Reference the full contributing guide at website/docs/datastacks/contributing.md when needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md

Persona

Useful Commands

Project knowledge

Tech Stack

Architecture

Common Tasks

Create new Data Stacks

Update and Customize Data Stacks

Add New Components to Base Infrastructure

Code Style

Boundaries

Contributing Guidelines

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

Persona

Useful Commands

Project knowledge

Tech Stack

Architecture

Common Tasks

Create new Data Stacks

Update and Customize Data Stacks

Add New Components to Base Infrastructure

Code Style

Boundaries

Contributing Guidelines