A proof-of-concept implementation of a scalable, cost-effective multi-tenant logging pipeline on AWS that implements "Centralized Ingestion, Decentralized Delivery" architecture.
- Collects logs from Kubernetes/OpenShift clusters using Vector agents
- Stores logs centrally in S3 with intelligent compression and partitioning
- Delivers logs to multiple customer AWS accounts simultaneously
- Supports multiple delivery types per tenant (CloudWatch Logs + S3)
- Reduces costs by ~90% compared to direct CloudWatch Logs ingestion
graph LR
K8s[Kubernetes Clusters] --> Vector[Vector Agents]
Vector --> S3[Central S3 Storage]
S3 --> SNS[Event Processing]
SNS --> Lambda[Log Processor]
Lambda --> CW1[Customer 1<br/>CloudWatch Logs]
Lambda --> CW2[Customer 2<br/>CloudWatch Logs]
Lambda --> S3_1[Customer 1<br/>S3 Bucket]
Lambda --> S3_2[Customer 2<br/>S3 Bucket]
Key Benefits:
- Multi-Delivery: Each tenant can receive logs via CloudWatch Logs AND S3 simultaneously
- Direct S3 Writes: Eliminates Kinesis Firehose costs (~$50/TB saved)
- Cross-Account Security: Secure delivery using IAM role assumption
- Container-Based Processing: Modern Lambda functions using ECR containers
- π 5-Minute Setup - Get running quickly
- ποΈ Architecture Deep Dive - Comprehensive system design
- π» Development Guide - Local development and testing
- βοΈ Terraform Infrastructure - LocalStack development environment
- π’ Kubernetes Deployment - Vector and processor deployment
- π API Management - Tenant configuration API
- π Troubleshooting - Common issues and solutions
- Podman for container builds and LocalStack
- Go 1.21+ for log processor development
- Terraform for infrastructure as code
- Make for development workflow automation
- kubectl (optional, for cluster deployments)
# Start LocalStack
make start
# Build the log processor container
make build
# Deploy infrastructure to LocalStack
make deploy
# Run integration tests
make test-e2e
# View all available commands
make help# Create logging namespace
kubectl create namespace logging
# Deploy Vector collector (OpenShift with specific overlay)
kubectl apply -k k8s/collector/overlays/cuppett
# Verify deployment
kubectl get pods -n logging# Tenant configurations are automatically created by Terraform
# View tenant configs in LocalStack
TABLE_NAME=$(cd terraform/local && terraform output -raw central_dynamodb_table)
aws --endpoint-url=http://localhost:4566 dynamodb scan --table-name $TABLE_NAMEπ Complete Deployment Guide | π§ Development Guide
# View all available commands
make help
# Full workflow: start LocalStack, build, deploy, test
make start
make build
make deploy
make test-e2e
# Run processor in scan mode
make run-scan
# Validate Vector log flow
make validate-vector-flow- Collector Container: Vector binary for log collection
- Processor Container: Go-based processor with multi-stage build
- Multi-Mode Support: Lambda, scan mode, and manual testing
π» Full Development Guide | π¦ Makefile Reference
- Vector log collection with namespace filtering and intelligent parsing
- Direct S3 storage with GZIP compression and dynamic partitioning
- Multi-delivery support - CloudWatch Logs + S3 per tenant
- Application filtering with individual apps and pre-defined groups (API, Authentication, etc.)
- Container-based Lambda processing with ECR images
- Cross-account security via double-hop role assumption
- Cost optimization with S3 lifecycle policies and compression
- Development tools with fake log generator and local testing
- API management for tenant configuration via REST API
- Basic monitoring - AWS native services only (no custom metrics/dashboards)
- Simple error handling - DLQ and retry logic without advanced workflow
- Regional deployment - Manual multi-region setup required
- Minimal UI - Configuration via API/CLI only
- This Pipeline: ~$50/month (S3 + Lambda + supporting services)
- Direct CloudWatch: ~$500/month (ingestion costs)
- Kinesis Firehose: ~$100/month (additional processing costs)
- Throughput: ~20,000 events/second per cluster node
- Latency: ~2-5 minutes from log generation to delivery
- Compression: ~30:1 ratio with GZIP
- Scalability: Horizontal scaling via multiple processor instances
- Namespace Isolation: Vector only collects from labeled namespaces
- Cross-Account Access: Customer roles with ExternalId validation
- Encryption: SSE-S3/KMS encryption for all data at rest
- Least Privilege: Minimal IAM permissions with resource restrictions
- Audit Trail: All role assumptions logged in CloudTrail
- Check Development Guide for local setup
- Review Architecture Design for system understanding
- Test changes in development environment first
- Submit pull requests with detailed descriptions
This project is licensed under the MIT License - see the LICENSE file for details.
ποΈ POC Status: This project demonstrates core functionality with minimal complexity. Advanced monitoring, alerting, and management features should be added incrementally after pipeline validation.