You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|Storage|JSONL → Parquet | Same (+ optional compaction job)|
38
+
|Queryability | S3 only | Optional Glue Catalog/Crawler + Athena|
39
+
|Data quality | — | Optional Glue Job + GE gate|
40
+
| Observability | Logs only |Optional CloudWatch dashboards + alarms|
41
+
| CI/CD | Local apply | CI + manual Terraform workflow|
42
42
43
43
## Quickstart
44
44
@@ -124,3 +124,27 @@ Recommendation: keep `ge_emit_events_from_transform=false` and `ge_eventbridge_e
124
124
## License
125
125
126
126
MIT — see `LICENSE`.
127
+
128
+
## Why “optional” is emphasized
129
+
130
+
This repo is designed so you can keep a minimal, low-cost baseline (the core S3→Lambda→SQS→Lambda→S3 pipeline) and enable “enterprise” capabilities via Terraform toggles. Marking modules as **optional** avoids misunderstandings when:
131
+
132
+
- You intentionally keep a capability off (cost, complexity, or permissions).
133
+
- Your org policies block certain APIs (for example, CloudWatch dashboard/alarm writes).
134
+
- You want to demonstrate the architecture and the toggles without implying every deployment has every module enabled.
135
+
136
+
If you enable a module in your environment, it is valid to describe it as “included in my deployment” in interviews.
137
+
138
+
## Resume-ready project summary (copy/paste)
139
+
140
+
**One-liner**
141
+
142
+
Built a production-lite, serverless ELT framework on AWS (S3 bronze JSONL → Lambda ingest → SQS (+ DLQ) → Lambda transform → S3 silver Parquet) with optional orchestration, catalog/query, and data quality gates.
143
+
144
+
**Highlights**
145
+
146
+
- Implemented object-level idempotency using AWS Lambda Powertools Idempotency backed by DynamoDB (conditional writes + TTL) to prevent duplicate ingestion across retries and duplicate events.
147
+
- Designed for reliability with SQS partial batch failure handling, DLQ + redrive tooling, and S3-copy replay for backfills without direct queue access.
148
+
- Produced query-ready Parquet outputs and integrated optional Glue Data Catalog/Crawler so Athena can query the silver layer as tables.
149
+
- Added optional operational workflows (Step Functions) to orchestrate replay/backfill and downstream readiness/quality checks (with optional EventBridge auto-triggering).
150
+
- Delivered infrastructure as code (Terraform modules) and CI automation (pytest + terraform fmt checks; manual Terraform plan/apply workflow supporting OIDC or access keys).
0 commit comments