Skip to content

Latest commit

 

History

History
403 lines (309 loc) · 8.03 KB

File metadata and controls

403 lines (309 loc) · 8.03 KB

Kubernetes Deployment Guide

Overview

This directory contains Kubernetes manifests for deploying IronSys to a Kubernetes cluster.

Structure

k8s/
├── base/                      # Base configuration (environment-agnostic)
│   ├── namespace.yaml         # Namespace definition
│   ├── configmap.yaml         # Application configuration
│   ├── secret.yaml            # Secrets (example - use sealed secrets in prod)
│   ├── python-api-deployment.yaml
│   ├── python-worker-deployment.yaml
│   ├── ingress.yaml           # Ingress configuration
│   └── hpa.yaml               # Horizontal Pod Autoscaler
├── overlays/
│   ├── dev/                   # Development environment overlays
│   └── prod/                  # Production environment overlays
└── README.md                  # This file

Prerequisites

  1. Kubernetes Cluster (v1.24+)

    • Minikube (local development)
    • EKS, GKE, AKS (cloud)
    • Self-managed
  2. kubectl configured

    kubectl version --client
  3. Infrastructure Services

    • PostgreSQL (RDS, Cloud SQL, or self-hosted)
    • Redis (ElastiCache, MemoryStore, or self-hosted)
    • Kafka (MSK, Confluent Cloud, or self-hosted via Strimzi)
  4. Ingress Controller (optional but recommended)

    kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml
  5. Cert Manager (for TLS certificates)

    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml

Quick Start

1. Update Configuration

Edit base/configmap.yaml and base/secret.yaml with your environment-specific values:

# For production, use sealed secrets or external secret management
kubectl create secret generic ironsys-secrets \
  --from-literal=DATABASE_URL="postgresql://user:pass@host:5432/db" \
  --from-literal=REDIS_URL="redis://redis-host:6379/0" \
  --from-literal=KAFKA_BROKERS="kafka-broker1:9092,kafka-broker2:9092" \
  --namespace ironsys \
  --dry-run=client -o yaml > secret.yaml

2. Deploy Infrastructure (if needed)

If you don't have external PostgreSQL, Redis, or Kafka:

# Deploy PostgreSQL
kubectl apply -f infra/postgres.yaml

# Deploy Redis
kubectl apply -f infra/redis.yaml

# Deploy Kafka (using Strimzi operator)
kubectl create -f 'https://strimzi.io/install/latest?namespace=ironsys'
kubectl apply -f infra/kafka.yaml

3. Deploy Application

# Create namespace
kubectl apply -f base/namespace.yaml

# Deploy configurations
kubectl apply -f base/configmap.yaml
kubectl apply -f base/secret.yaml

# Deploy applications
kubectl apply -f base/python-api-deployment.yaml
kubectl apply -f base/python-worker-deployment.yaml

# Deploy ingress (optional)
kubectl apply -f base/ingress.yaml

# Deploy autoscaling
kubectl apply -f base/hpa.yaml

4. Verify Deployment

# Check pods
kubectl get pods -n ironsys

# Check services
kubectl get svc -n ironsys

# Check logs
kubectl logs -n ironsys -l component=api --tail=100

# Port forward for local access
kubectl port-forward -n ironsys svc/python-api 8000:8000

Environment-Specific Deployments

Development

kubectl apply -k overlays/dev

Production

kubectl apply -k overlays/prod

Monitoring

Prometheus Integration

The application exposes metrics at /metrics endpoint:

apiVersion: v1
kind: ServiceMonitor
metadata:
  name: ironsys-metrics
  namespace: ironsys
spec:
  selector:
    matchLabels:
      app: ironsys
  endpoints:
  - port: metrics
    interval: 30s

Grafana Dashboards

Import the Grafana dashboard from ../infra/grafana/dashboards/ironsys-overview.json

Scaling

Manual Scaling

# Scale API
kubectl scale deployment python-api -n ironsys --replicas=5

# Scale Worker
kubectl scale deployment python-worker -n ironsys --replicas=4

Autoscaling

HPA (Horizontal Pod Autoscaler) is configured in base/hpa.yaml:

  • API: 3-10 replicas based on CPU/memory
  • Worker: 2-8 replicas based on CPU/memory

To use custom metrics (Kafka lag):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: python-worker-hpa
spec:
  metrics:
  - type: External
    external:
      metric:
        name: kafka_consumer_lag
      target:
        type: AverageValue
        averageValue: "100"

High Availability

Pod Disruption Budget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: python-api-pdb
  namespace: ironsys
spec:
  minAvailable: 2
  selector:
    matchLabels:
      component: api

Multi-AZ Deployment

Use pod anti-affinity to spread across availability zones:

spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: component
              operator: In
              values:
              - api
          topologyKey: topology.kubernetes.io/zone

Troubleshooting

Check Pod Status

kubectl get pods -n ironsys
kubectl describe pod <pod-name> -n ironsys

View Logs

# Recent logs
kubectl logs -n ironsys <pod-name> --tail=100

# Follow logs
kubectl logs -n ironsys <pod-name> -f

# Previous container logs (if crashed)
kubectl logs -n ironsys <pod-name> --previous

Debug Container

kubectl exec -it -n ironsys <pod-name> -- /bin/sh

Check Resource Usage

kubectl top pods -n ironsys
kubectl top nodes

Security

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ironsys-network-policy
  namespace: ironsys
spec:
  podSelector:
    matchLabels:
      app: ironsys
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8000
  egress:
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 5432  # PostgreSQL
    - protocol: TCP
      port: 6379  # Redis
    - protocol: TCP
      port: 9092  # Kafka

Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
  name: ironsys
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Backup and Disaster Recovery

Database Backups

# Create backup
kubectl exec -n ironsys <postgres-pod> -- pg_dump -U dev ironsys > backup.sql

# Restore backup
kubectl exec -i -n ironsys <postgres-pod> -- psql -U dev ironsys < backup.sql

Configuration Backups

# Export all resources
kubectl get all,configmap,secret -n ironsys -o yaml > ironsys-backup.yaml

CI/CD Integration

GitHub Actions

See .github/workflows/cd.yml for deployment pipeline.

ArgoCD

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: ironsys
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/IronSys
    targetRevision: HEAD
    path: k8s/overlays/prod
  destination:
    server: https://kubernetes.default.svc
    namespace: ironsys
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Cost Optimization

  1. Right-size resources: Adjust requests/limits based on actual usage
  2. Use spot/preemptible instances for workers
  3. Enable cluster autoscaler
  4. Use pod priority and preemption

Performance Tuning

Resource Limits

Adjust based on load testing:

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

Connection Pooling

Update ConfigMap:

REDIS_POOL_MAX_CONNECTIONS: "100"
DB_POOL_SIZE: "20"

References