Skip to content

NUS-MTech-SE-33-PT-ArchSS-G3/infrastructure

Repository files navigation

BidderGod Infrastructure

This directory contains the Terraform infrastructure code for the BidderGod C2C auction platform deployed on AWS ECS Fargate with a complete microservices architecture.

Architecture Overview

Internet → Kong API Gateway (ECS) → Microservices (ECS + Service Discovery)
                                   → Infrastructure Services (Kafka, DBs on ECS + EFS)

Key Components

  • VPC: Custom VPC with public subnets in 1 availability zone (cost-optimized)
  • ECS Fargate Spot: Serverless containers with ~70% cost savings
  • Kong API Gateway: Single entry point with load balancing, rate limiting, and authentication
  • AWS Cloud Map: Private DNS service discovery (biddergod-dev.local)
  • EFS: Persistent storage for stateful services (databases, Kafka)
  • ECR: Private container registries for 17 microservices
  • CloudWatch: Centralized logging (3-day retention)
  • No ALB: Direct public IP access via Kong (saves $16/month)
  • No NAT Gateway: Public subnets only (saves ~$32/month)

Deployed Services (17 Total)

Application Microservices (8)

Service Port Technology Purpose
user-service 8080 Spring Boot User management, authentication
auction-service 4000 Node.js/Express Auction lifecycle management
bid-command 8082 Go/Gin Write operations for bidding (CQRS)
bid-query 8083 Go/Gin Read operations for bid history (CQRS)
auction-projector 8084 Go Kafka consumer for auction events
bid-projector 8085 Go Kafka consumer for bid events
payment-service 3000 NestJS Stripe payment processing
sse-stream-service 8086 Node.js Real-time event streaming to frontend

Infrastructure Services (5)

Service Port Technology Purpose
kafka 9092 Confluent Kafka 7.9.4 Event streaming (KRaft mode)
postgres 5432 PostgreSQL 18 User, auction, and bidding data
mysql 3306 MySQL 8.0 Payment service data
mongo 27017 MongoDB Bid query read model (CQRS)
redis 6379 Redis 8 Auction metadata cache

Monitoring & Gateway (4)

Service Port Technology Purpose
kong 8000 Kong Gateway API gateway with LB and rate limiting
prometheus 9090 Prometheus Metrics collection
grafana 3001 Grafana Metrics visualization
kafka-ui 8080 Provectus Kafka management UI

Service Communication

External Traffic:

Internet → Kong (public IP) → Backend services (via Cloud Map)

Internal Traffic:

Services use AWS Cloud Map DNS:
- kafka.biddergod-dev.local:9092
- postgres.biddergod-dev.local:5432
- redis.biddergod-dev.local:6379

Data Persistence:

  • Databases and Kafka use EFS volumes
  • Data survives task restarts
  • Automatic backups via EFS snapshots (optional)

Prerequisites

  1. AWS CLI with configured credentials

    aws configure
  2. Terraform >= 1.9 (for S3 native locking)

    terraform --version
  3. Docker for building images

  4. S3 Bucket for Terraform state (see setup below)

  5. Stripe API Key (for payment service)

    export TF_VAR_stripe_secret_key="sk_test_..."

Quick Start

1. Set Up Terraform State Backend

# Create S3 bucket for Terraform state
aws s3api create-bucket \
  --bucket biddergod-terraform-state \
  --region ap-southeast-1 \
  --create-bucket-configuration LocationConstraint=ap-southeast-1

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket biddergod-terraform-state \
  --versioning-configuration Status=Enabled

# Enable encryption
aws s3api put-bucket-encryption \
  --bucket biddergod-terraform-state \
  --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'

# Block public access
aws s3api put-public-access-block \
  --bucket biddergod-terraform-state \
  --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

2. Deploy Infrastructure

cd infrastructure

# Initialize Terraform
terraform init

# Review configuration
terraform plan

# Deploy (creates 17 ECS services, VPC, EFS, etc.)
terraform apply

# Get outputs
terraform output

Note: Initial deployment will fail because Docker images don't exist yet. ECS tasks will stay in PENDING state until you push images in step 3.

3. Build and Push All Docker Images

You need to build and push 17 images (excluding frontend which uses AWS Amplify):

# Get AWS account ID
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION="ap-southeast-1"

# Authenticate Docker to ECR
aws ecr get-login-password --region $AWS_REGION | \
  docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com

# Get ECR URLs from Terraform outputs
cd infrastructure
export USER_ECR=$(terraform output -raw user_service_ecr_url)
export AUCTION_ECR=$(terraform output -raw auction_service_ecr_url)
export BID_CMD_ECR=$(terraform output -raw bid_command_ecr_url)
export BID_QRY_ECR=$(terraform output -raw bid_query_ecr_url)
export AUC_PROJ_ECR=$(terraform output -raw auction_projector_ecr_url)
export BID_PROJ_ECR=$(terraform output -raw bid_projector_ecr_url)
export PAYMENT_ECR=$(terraform output -raw payment_service_ecr_url)
export SSE_ECR=$(terraform output -raw sse_stream_service_ecr_url)
export KONG_ECR=$(terraform output -raw kong_ecr_url)
cd ..

# Build and push each service
# 1. User Service
cd user-service
docker build -t user-service .
docker tag user-service:latest $USER_ECR:latest
docker push $USER_ECR:latest
cd ..

# 2. Auction Service
cd auction-service/src
docker build -t auction-service .
docker tag auction-service:latest $AUCTION_ECR:latest
docker push $AUCTION_ECR:latest
cd ../..

# 3-6. Bidding Services (4 services from one codebase)
cd bidding-service
docker build -f services/bid-command/Dockerfile -t bid-command .
docker tag bid-command:latest $BID_CMD_ECR:latest
docker push $BID_CMD_ECR:latest

docker build -f services/bid-query/Dockerfile -t bid-query .
docker tag bid-query:latest $BID_QRY_ECR:latest
docker push $BID_QRY_ECR:latest

docker build -f services/auction-projector/Dockerfile -t auction-projector .
docker tag auction-projector:latest $AUC_PROJ_ECR:latest
docker push $AUC_PROJ_ECR:latest

docker build -f services/bid-projector/Dockerfile -t bid-projector .
docker tag bid-projector:latest $BID_PROJ_ECR:latest
docker push $BID_PROJ_ECR:latest
cd ..

# 7. Payment Service
cd payment-service
docker build -t payment-service .
docker tag payment-service:latest $PAYMENT_ECR:latest
docker push $PAYMENT_ECR:latest
cd ..

# 8. SSE Stream Service
cd sse-stream-service
docker build -t sse-stream-service .
docker tag sse-stream-service:latest $SSE_ECR:latest
docker push $SSE_ECR:latest
cd ..

# 9. Kong API Gateway
cd docker-compose/config  # Assuming Kong config is here
docker build -t kong .
docker tag kong:latest $KONG_ECR:latest
docker push $KONG_ECR:latest
cd ../..

4. Verify Deployment

# Check ECS service status
aws ecs list-services --cluster biddergod-dev-cluster --region ap-southeast-1

# Check running tasks
aws ecs describe-services \
  --cluster biddergod-dev-cluster \
  --services biddergod-dev-user-service \
  --region ap-southeast-1 \
  --query 'services[0].{Running:runningCount,Desired:desiredCount,Status:status}'

# View logs
aws logs tail /ecs/biddergod-dev-user-service --follow --region ap-southeast-1

# Get Kong public IP
aws ecs describe-tasks \
  --cluster biddergod-dev-cluster \
  --tasks $(aws ecs list-tasks --cluster biddergod-dev-cluster --service-name biddergod-dev-kong --query 'taskArns[0]' --output text) \
  --query 'tasks[0].attachments[0].details[?name==`networkInterfaceId`].value' \
  --output text | xargs -I {} aws ec2 describe-network-interfaces --network-interface-ids {} --query 'NetworkInterfaces[0].Association.PublicIp' --output text

# Test via Kong
curl http://<kong-public-ip>:8000/health

Cost Estimate

Monthly Cost (Single AZ, Fargate Spot, 17 services)

ECS Fargate Spot Costs:

  • 17 services @ 256 CPU, 512-1024 MB each
  • Kafka (512 CPU, 1024 MB) - highest resource
  • Estimated: ~$0.30-0.40/hour = $216-288/month

Additional AWS Costs:

  • EFS storage (5-10 GB): $1.50-3/month
  • ECR storage (10 GB): $1/month
  • CloudWatch Logs (3-day retention): $5-10/month
  • Data transfer: $5-10/month

Total Estimated Cost: $228-312/month

Cost Savings vs ALB Architecture:

  • No ALB: Saves $16/month
  • No NAT Gateway: Saves $32/month
  • Fargate Spot: Saves ~70% vs on-demand
  • Single AZ: Reduces cross-AZ data transfer

To Minimize Costs:

# Stop all services when not in use
aws ecs update-service --cluster biddergod-dev-cluster --service biddergod-dev-<service-name> --desired-count 0 --region ap-southeast-1

# Or destroy everything
terraform destroy

Architecture Details

Networking

VPC Configuration:

  • CIDR: 10.0.0.0/16
  • Single AZ: ap-southeast-1a
  • Public subnets only (no private subnets)
  • Internet Gateway for ECR/internet access
  • NO NAT Gateway (cost optimization)

Security Groups:

  • ECS Tasks SG: Allows self-referencing (all ports) + specific ports from internet (80, 443, 8000 for Kong)
  • Kong: Exposed to internet on ports 80, 443, 8000, 8001
  • Other services: Only accessible internally via Cloud Map DNS

Service Discovery

AWS Cloud Map provides DNS-based service discovery:

  • Namespace: biddergod-dev.local
  • Example: kafka.biddergod-dev.local, postgres.biddergod-dev.local
  • TTL: 10 seconds
  • Health checks: ECS task health status

Persistent Storage

EFS File System:

  • Performance mode: General Purpose
  • Throughput mode: Bursting
  • Encryption at rest: Yes
  • Mount targets in public subnet

EFS Access Points (one per stateful service):

  • /postgres-data - PostgreSQL data directory
  • /mysql-data - MySQL data directory
  • /mongo-data - MongoDB data directory
  • /redis-data - Redis AOF persistence
  • /kafka-data - Kafka logs
  • /prometheus-data - Prometheus time series
  • /grafana-data - Grafana dashboards

Auto-Scaling (Configured but Disabled)

Each microservice supports auto-scaling:

  • Target tracking on CPU (default: 70%)
  • Target tracking on memory (default: 80%)
  • Min: 1 task, Max: 5 tasks (configurable)
  • Currently disabled (enable_auto_scaling = false)

To enable auto-scaling:

# In main.tf
module "user_service" {
  # ...
  enable_auto_scaling = true
  auto_scaling_min_capacity = 1
  auto_scaling_max_capacity = 5
  auto_scaling_cpu_target = 70
  auto_scaling_memory_target = 80
}

Monitoring

CloudWatch Logs

View logs for any service:

aws logs tail /ecs/biddergod-dev-<service-name> --follow --region ap-southeast-1

Prometheus + Grafana

Access Grafana dashboard:

  1. Get Grafana public IP (similar to Kong IP retrieval)
  2. Open http://:3001
  3. Login: admin/admin
  4. Datasource: Pre-configured Prometheus

Available Metrics:

  • Go service metrics (bid-command, bid-query, projectors)
  • Custom business metrics (bid counts, auction states)
  • System metrics (CPU, memory, network)

Terraform Outputs

terraform output  # View all outputs

Key Outputs:

  • ecs_cluster_name - ECS cluster name
  • ecs_cluster_arn - ECS cluster ARN
  • vpc_id - VPC ID
  • efs_file_system_id - EFS file system ID
  • service_discovery_namespace - Cloud Map namespace
  • *_ecr_url - ECR repository URLs for each service

Common Operations

Update a Service

# Rebuild and push new image
cd user-service
docker build -t user-service .
docker tag user-service:latest $USER_ECR:latest
docker push $USER_ECR:latest

# Force new deployment
aws ecs update-service \
  --cluster biddergod-dev-cluster \
  --service biddergod-dev-user-service \
  --force-new-deployment \
  --region ap-southeast-1

Scale a Service

# Scale up
aws ecs update-service \
  --cluster biddergod-dev-cluster \
  --service biddergod-dev-bid-command \
  --desired-count 3 \
  --region ap-southeast-1

View Service Events

aws ecs describe-services \
  --cluster biddergod-dev-cluster \
  --services biddergod-dev-user-service \
  --region ap-southeast-1 \
  --query 'services[0].events[0:5]'

Access Databases

# Get database task private IP
aws ecs describe-tasks \
  --cluster biddergod-dev-cluster \
  --tasks $(aws ecs list-tasks --cluster biddergod-dev-cluster --service-name biddergod-dev-postgres --query 'taskArns[0]' --output text) \
  --query 'tasks[0].attachments[0].details[?name==`privateIPv4Address`].value' \
  --output text

# Connect via bastion or VPN
psql -h <postgres-private-ip> -U postgres -d postgres

Troubleshooting

Service Won't Start

Check task stopped reason:

aws ecs describe-tasks \
  --cluster biddergod-dev-cluster \
  --tasks <task-arn> \
  --region ap-southeast-1 \
  --query 'tasks[0].stoppedReason'

Common issues:

  • Image not in ECR: Build and push image first
  • Health check failing: Check application health endpoint
  • Missing environment variables: Check task definition
  • EFS mount failing: Check security groups allow NFS (port 2049)
  • Resource limits: Increase CPU/memory in Terraform

EFS Not Mounting

Check EFS mount targets:

aws efs describe-mount-targets \
  --file-system-id $(terraform output -raw efs_file_system_id) \
  --region ap-southeast-1

Check security group:

# ECS tasks SG must allow inbound NFS from itself
aws ec2 describe-security-groups \
  --group-ids $(terraform output -raw ecs_security_group_id) \
  --region ap-southeast-1

Service Discovery Not Working

Test DNS resolution from a running task:

# Get a task ARN
TASK_ARN=$(aws ecs list-tasks --cluster biddergod-dev-cluster --service-name biddergod-dev-user-service --query 'taskArns[0]' --output text)

# Execute command in task
aws ecs execute-command \
  --cluster biddergod-dev-cluster \
  --task $TASK_ARN \
  --container user-service \
  --command "nslookup kafka.biddergod-dev.local" \
  --interactive \
  --region ap-southeast-1

Cleanup

Stop All Services

# Stop all services (doesn't delete infrastructure)
for service in user-service auction-service bid-command bid-query auction-projector bid-projector payment-service sse-stream-service kong postgres mysql mongo redis kafka prometheus grafana; do
  aws ecs update-service \
    --cluster biddergod-dev-cluster \
    --service biddergod-dev-$service \
    --desired-count 0 \
    --region ap-southeast-1
done

Destroy Infrastructure

cd infrastructure

# Preview what will be destroyed
terraform plan -destroy

# Destroy everything
terraform destroy

# Note: EFS data will be deleted. Backup if needed!

Warning: This deletes:

  • All ECS services and tasks
  • EFS file system (all database data!)
  • ECR repositories (if empty)
  • VPC, subnets, security groups
  • CloudWatch log groups
  • Service discovery namespace

Production Considerations

This infrastructure is optimized for development/demo purposes. For production:

  1. High Availability:

    • Deploy across 2-3 AZs
    • Increase desired count for services
    • Use RDS Multi-AZ instead of PostgreSQL/MySQL on ECS
    • Use AWS MSK instead of self-hosted Kafka
    • Use ElastiCache instead of Redis on ECS
    • Use DocumentDB instead of MongoDB on ECS
  2. Security:

    • Enable private subnets with NAT Gateway
    • Use AWS Secrets Manager for sensitive env vars
    • Enable VPC Flow Logs
    • Configure WAF rules for Kong
    • Use TLS/SSL certificates
    • Restrict security groups further
  3. Monitoring:

    • Enable Container Insights
    • Set up CloudWatch alarms
    • Configure log aggregation (e.g., ELK stack)
    • Enable X-Ray tracing
    • Increase log retention
  4. Scalability:

    • Enable auto-scaling for all services
    • Configure appropriate CPU/memory targets
    • Use Application Auto Scaling for databases
    • Implement caching strategies
    • Consider API rate limiting at Kong
  5. Cost Optimization:

    • Use Savings Plans or Reserved Instances
    • Right-size resources after load testing
    • Implement lifecycle policies for logs/images
    • Use Fargate on-demand for critical services
    • Enable EFS Infrequent Access tier

Additional Resources

Support

For issues:

  1. Check CloudWatch logs first
  2. Review ECS service events
  3. Verify security groups and network connectivity
  4. Check service discovery DNS resolution
  5. Validate environment variables

Common commands reference:

# List all services
aws ecs list-services --cluster biddergod-dev-cluster --region ap-southeast-1

# Describe service
aws ecs describe-services --cluster biddergod-dev-cluster --services biddergod-dev-<service-name> --region ap-southeast-1

# List tasks
aws ecs list-tasks --cluster biddergod-dev-cluster --service-name biddergod-dev-<service-name> --region ap-southeast-1

# Describe task
aws ecs describe-tasks --cluster biddergod-dev-cluster --tasks <task-arn> --region ap-southeast-1

# Tail logs
aws logs tail /ecs/biddergod-dev-<service-name> --follow --region ap-southeast-1

# View Terraform state
terraform show

# View specific output
terraform output <output-name>

About

For AWS infra deployment

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages