Skip to content

hitsub2/starrocks-on-eks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StarRocks and ClickHouse on EKS with Karpenter - China Region

This Terraform configuration deploys an Amazon EKS cluster optimized for StarRocks and ClickHouse workloads in AWS China regions (cn-north-1 or cn-northwest-1). The deployment includes Karpenter for auto-scaling, monitoring stack with Prometheus and Grafana, the StarRocks operator, and the ClickHouse operator.

Architecture Overview

  • EKS Cluster: Kubernetes 1.33 with managed node groups (m6i.2xlarge x4)
  • Karpenter: Auto-scaling with Graviton compute node pools
  • StarRocks Operator: Custom operator for managing StarRocks clusters
  • ClickHouse Operator: Altinity ClickHouse operator for managing ClickHouse clusters
  • Monitoring: Kube-Prometheus stack with Grafana dashboards
  • Storage: EBS CSI driver with GP3 encrypted storage
  • Load Balancing: AWS Load Balancer Controller

Key Features

✅ China Region Optimized

  • Region: Configured for cn-north-1/cn-northwest-1
  • ECR Images: Uses public ECR images accessible from China
  • Registry IDs: Correct account IDs for China regions

✅ Removed Addons (as requested)

  • aws-cloudwatch-metrics
  • cluster-proportional-autoscaler
  • cluster-autoscaler-aws-cluster-autoscaler
  • aws-for-fluent-bit
  • kubecost
  • eks-pod-identity-agent

✅ Added Addons

  • Karpenter: Auto-scaling with Graviton compute node pools
  • Metrics Server: Custom image from public.ecr.aws/bitnami/metrics-server:0.8.0
  • EBS CSI Driver: For persistent storage
  • AWS Load Balancer Controller: For ingress and load balancing
  • StarRocks Operator: v1.10.2 from public.ecr.aws/dong-registry with embedded CRD
  • ClickHouse Operator: v0.25.2 from public.ecr.aws/altinity with full CRD support

✅ Managed Node Group Configuration

  • Instance Type: m6i.2xlarge (8 vCPU, 32 GiB RAM)
  • Node Count: 4 nodes (min: 4, max: 8, desired: 4)
  • Storage: 100GB GP3 root volumes

✅ Karpenter Node Pools

  • Graviton Compute: ARM64 instances (c6g, c7g, m6g, m7g, r6g, r7g families)
  • Auto-scaling: Spot and On-Demand instances
  • Instance Store: RAID0 configuration for high performance

Prerequisites

  1. AWS CLI: Configured with China region credentials
  2. Terraform: Version >= 1.3.2
  3. kubectl: For cluster management
  4. Permissions: EKS, VPC, IAM, and ECR permissions

Quick Start

1. Configure Variables

cp terraform.tfvars.example terraform.tfvars

Edit terraform.tfvars with your specific configuration:

# Basic Configuration
name   = "starrocks-eks-karpenter"
region = "cn-north-1"  # or cn-northwest-1

# EKS Configuration
eks_cluster_version = "1.33"

# StarRocks Configuration
starrocks_namespace      = "starrocks"
starrocks_operator_image = "public.ecr.aws/dong-registry/starrocks-operator:v1.10.2"

# ClickHouse Configuration
enable_clickhouse_operator         = true
clickhouse_namespace               = "clickhouse"
clickhouse_operator_image          = "public.ecr.aws/altinity/clickhouse-operator:0.25.2"
clickhouse_metrics_exporter_image  = "public.ecr.aws/altinity/metrics-exporter:0.25.2"

# Tags
tags = {
  Environment = "dev"
  Project     = "starrocks-eks"
  Owner       = "platform-team"
}

2. Deploy Infrastructure

# Initialize Terraform
terraform init

# Plan the deployment
terraform plan

# Apply the configuration
terraform apply

3. Configure kubectl

# Update kubeconfig (use the output from terraform apply)
aws eks --region cn-north-1 update-kubeconfig --name starrocks-eks-karpenter

4. Verify Deployment

# Check cluster status
kubectl get nodes

# Check StarRocks operator
kubectl get deployment kube-starrocks-operator -n starrocks

# Check ClickHouse operator
kubectl get deployment clickhouse-operator -n kube-system

# Check Karpenter
kubectl get deployment karpenter -n karpenter

# Check monitoring stack
kubectl get pods -n monitoring

# Check all namespaces
kubectl get pods --all-namespaces

StarRocks Cluster Deployment

After the infrastructure is deployed, you can deploy StarRocks clusters using the operator:

# Create a StarRocks cluster (example)
cat <<EOF | kubectl apply -f -
apiVersion: starrocks.com/v1
kind: StarRocksCluster
metadata:
  name: starrocks-cluster
  namespace: starrocks
spec:
  starRocksFeSpec:
    image: starrocks/fe-ubuntu:3.2-latest
    replicas: 1
    requests:
      cpu: "1"
      memory: "2Gi"
    limits:
      cpu: "2"
      memory: "4Gi"
  starRocksBeSpec:
    image: starrocks/be-ubuntu:3.2-latest
    replicas: 3
    requests:
      cpu: "2"
      memory: "4Gi"
    limits:
      cpu: "4"
      memory: "8Gi"
    storageVolumes:
      - name: be-storage
        storageClassName: gp3
        storageSize: 100Gi
EOF

ClickHouse Cluster Deployment

After the infrastructure is deployed, you can deploy ClickHouse clusters using the operator with Karpenter node pools:

# Create a ClickHouse cluster using Karpenter Graviton nodes
cat <<EOF | kubectl apply -f -
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  name: clickhouse-cluster
  namespace: clickhouse
spec:
  useTemplates:
    - name: karpenter-graviton-pod-template
    - name: karpenter-storage-template-10Gi
  configuration:
    clusters:
      - name: "cluster"
        layout:
          shardsCount: 1
          replicasCount: 1
    users:
      admin/password: admin123
      admin/networks/ip:
        - "0.0.0.0/0"
  templates:
    podTemplates:
      - name: clickhouse-karpenter-pod-template
        spec:
          nodeSelector:
            type: karpenter
            provisioner: graviton-compute
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:23.8
              resources:
                requests:
                  memory: "2Gi"
                  cpu: "1000m"
                limits:
                  memory: "8Gi"
                  cpu: "4000m"
    volumeClaimTemplates:
      - name: data-volume-template
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 20Gi
          storageClassName: gp3
  defaults:
    templates:
      podTemplate: clickhouse-karpenter-pod-template
      dataVolumeClaimTemplate: data-volume-template
EOF

Alternative: Using Pre-built Templates

You can also use the pre-built Karpenter templates:

# Create a ClickHouse cluster using pre-built Karpenter templates
cat <<EOF | kubectl apply -f -
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  name: clickhouse-cluster-simple
  namespace: clickhouse
spec:
  useTemplates:
    - name: karpenter-graviton-pod-template
    - name: karpenter-storage-template-10Gi
  configuration:
    clusters:
      - name: "cluster"
        layout:
          shardsCount: 1
          replicasCount: 1
    users:
      admin/password: admin123
      admin/networks/ip:
        - "0.0.0.0/0"
EOF

File Structure

terraform-starrocks-eks/
├── main.tf                    # Main Terraform configuration
├── variables.tf               # Variable definitions
├── versions.tf                # Provider versions
├── vpc.tf                     # VPC configuration
├── eks.tf                     # EKS cluster configuration
├── addons.tf                  # Kubernetes addons and StarRocks operator
├── clickhouse-operator.tf     # ClickHouse operator configuration
├── outputs.tf                 # Output values
├── terraform.tfvars.example   # Example variable values
└── README.md                  # This file

Customization

Node Pool Configuration

The Graviton compute node pool can be customized in addons.tf:

# Modify instance families, sizes, or limits
karpenter_resources_helm_config = {
  graviton-compute = {
    values = [
      <<-EOT
      # Customize instance types, limits, etc.
      EOT
    ]
  }
}

StarRocks Operator Configuration

Modify the operator deployment in addons.tf:

# Update image, resources, or security settings
resource "kubernetes_deployment" "starrocks_operator" {
  # Configuration here
}

Monitoring Stack

The monitoring stack (kube-prometheus-stack) has been disabled. If monitoring is needed, it can be enabled by setting enable_kube_prometheus_stack = true in the EKS blueprints addons configuration.

Troubleshooting

Common Issues

1. ECR Image Pull Issues

# Test ECR public access
docker pull public.ecr.aws/dong-registry/starrocks-operator:v1.10.2

2. Karpenter Node Provisioning

# Check Karpenter logs
kubectl logs -n karpenter deployment/karpenter

# Check node pools
kubectl get nodepools
kubectl get ec2nodeclasses

3. StarRocks Operator Issues

# Check operator logs
kubectl logs -n starrocks deployment/kube-starrocks-operator

# Check RBAC permissions
kubectl auth can-i create starrocksclusters --as=system:serviceaccount:starrocks:starrocks

4. ClickHouse Operator Issues

# Check operator logs
kubectl logs -n kube-system deployment/clickhouse-operator

# Check ClickHouse installations
kubectl get chi -A

# Check ClickHouse operator status
kubectl get pods -n kube-system -l app=clickhouse-operator

5. Monitoring Stack Issues

# Check prometheus operator
kubectl get pods -n monitoring | grep prometheus-operator

# Check CRDs
kubectl get crd | grep monitoring

Monitoring Commands

# Cluster Status
kubectl get nodes -o wide
kubectl top nodes

# StarRocks Status
kubectl get starrocksclusters -n starrocks
kubectl get pods -n starrocks

# ClickHouse Status
kubectl get chi -n clickhouse
kubectl get pods -n clickhouse

# Karpenter Status
kubectl get nodepools
kubectl describe nodepool graviton-compute

# Monitoring Status
kubectl get pods -n monitoring
kubectl get svc -n monitoring

Cleanup

To remove all resources:

# Delete StarRocks clusters first
kubectl delete starrocksclusters --all -n starrocks

# Delete ClickHouse clusters first
kubectl delete chi --all -n clickhouse

# Wait for cleanup
kubectl get pods -n starrocks --watch
kubectl get pods -n clickhouse --watch

# Destroy Terraform resources
terraform destroy

Security Considerations

  • Node Security: All containers run as non-root with read-only filesystems
  • RBAC: Minimal required permissions for each component
  • Network: Private subnets with NAT gateway for outbound access
  • Storage: Encrypted EBS volumes with GP3 performance
  • Secrets: Grafana password stored in AWS Secrets Manager

Performance Optimization

  • Instance Types: m6i.2xlarge for consistent performance
  • Storage: GP3 with encryption for optimal I/O
  • Networking: Secondary CIDR blocks for pod networking
  • Auto-scaling: Karpenter with Graviton instances for cost optimization

Support

For issues related to:

  • Terraform: Check terraform plan output and state files
  • EKS: Verify cluster health and node group status
  • StarRocks: Check operator logs and cluster status
  • Monitoring: Verify prometheus operator and CRDs

Version Compatibility

  • Terraform: >= 1.3.2
  • EKS: 1.33
  • Kubernetes: 1.33
  • StarRocks Operator: v1.10.2
  • Karpenter: 1.2.1
  • Metrics Server: 0.8.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages