Skip to content

Commit ea1beaa

Browse files
committed
Initial commit: Complete AWS Data Platform template
- Comprehensive infrastructure as code using AWS CDK - Real-time streaming with Kinesis, Lambda, and DynamoDB - Batch processing with EMR, Glue, and Athena - Data warehousing with Redshift and S3 data lake - Machine learning platform with SageMaker - Business intelligence with QuickSight - Complete CI/CD pipeline with GitHub Actions - Deployment scripts and environment configuration - Production-ready architecture for analytics and ML workloads
0 parents  commit ea1beaa

File tree

12 files changed

+3071
-0
lines changed

12 files changed

+3071
-0
lines changed

.env.example

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# AWS Data Platform Environment Configuration
2+
# Copy this file to .env.dev, .env.staging, or .env.prod and update values
3+
4+
# AWS Configuration
5+
AWS_ACCOUNT_ID=123456789012
6+
AWS_REGION=us-east-1
7+
ENVIRONMENT=dev
8+
9+
# Data Lake Configuration
10+
DATA_LAKE_BUCKET_PREFIX=my-company-data-lake
11+
RAW_BUCKET_NAME=data-platform-raw
12+
PROCESSED_BUCKET_NAME=data-platform-processed
13+
CURATED_BUCKET_NAME=data-platform-curated
14+
15+
# Redshift Configuration
16+
REDSHIFT_CLUSTER_IDENTIFIER=data-platform-cluster
17+
REDSHIFT_DATABASE_NAME=analytics
18+
REDSHIFT_MASTER_USER=admin
19+
REDSHIFT_MASTER_PASSWORD=ChangeMePlease123!
20+
REDSHIFT_NODE_TYPE=dc2.large
21+
REDSHIFT_NUMBER_OF_NODES=2
22+
23+
# EMR Configuration
24+
EMR_CLUSTER_NAME=data-platform-emr
25+
EMR_MASTER_INSTANCE_TYPE=m5.xlarge
26+
EMR_WORKER_INSTANCE_TYPE=m5.xlarge
27+
EMR_WORKER_INSTANCE_COUNT=3
28+
EMR_RELEASE_LABEL=emr-6.9.0
29+
30+
# Kinesis Configuration
31+
KINESIS_MAIN_STREAM_SHARDS=2
32+
KINESIS_CLICKSTREAM_SHARDS=2
33+
KINESIS_IOT_STREAM_SHARDS=4
34+
KINESIS_RETENTION_DAYS=7
35+
36+
# DynamoDB Configuration
37+
DYNAMODB_READ_CAPACITY=5
38+
DYNAMODB_WRITE_CAPACITY=5
39+
DYNAMODB_BILLING_MODE=PAY_PER_REQUEST
40+
41+
# SageMaker Configuration
42+
SAGEMAKER_NOTEBOOK_INSTANCE_TYPE=ml.t3.medium
43+
SAGEMAKER_TRAINING_INSTANCE_TYPE=ml.m5.xlarge
44+
SAGEMAKER_ENDPOINT_INSTANCE_TYPE=ml.t2.medium
45+
46+
# Monitoring Configuration
47+
NOTIFICATION_EMAIL=data-team@company.com
48+
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
49+
PAGERDUTY_INTEGRATION_KEY=your-pagerduty-key
50+
51+
# Security Configuration
52+
ENABLE_ENCRYPTION=true
53+
ENABLE_VPC_FLOW_LOGS=true
54+
ENABLE_CLOUDTRAIL=true
55+
KMS_KEY_ALIAS=alias/data-platform
56+
57+
# Network Configuration
58+
VPC_CIDR=10.0.0.0/16
59+
PUBLIC_SUBNET_CIDRS=10.0.1.0/24,10.0.2.0/24
60+
PRIVATE_SUBNET_CIDRS=10.0.10.0/24,10.0.11.0/24
61+
DATABASE_SUBNET_CIDRS=10.0.20.0/24,10.0.21.0/24
62+
63+
# Cost Management
64+
ENABLE_AUTO_SHUTDOWN=false
65+
AUTO_SHUTDOWN_TIME=19:00
66+
AUTO_STARTUP_TIME=07:00
67+
BUDGET_ALERT_THRESHOLD=1000
68+
69+
# Feature Flags
70+
ENABLE_ML_PIPELINE=true
71+
ENABLE_REAL_TIME_ANALYTICS=true
72+
ENABLE_DATA_QUALITY_CHECKS=true
73+
ENABLE_COST_OPTIMIZATION=true
74+
75+
# Tags
76+
PROJECT_NAME=DataPlatform
77+
COST_CENTER=DataEngineering
78+
OWNER=data-team@company.com
79+
COMPLIANCE=GDPR

0 commit comments

Comments
 (0)