Skip to content

Commit afe95ee

Browse files
committed
docs: add resume summary and clarify optional
1 parent 11955ff commit afe95ee

File tree

1 file changed

+34
-10
lines changed

1 file changed

+34
-10
lines changed

README.md

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,16 @@ No VPC/EC2 is required for the minimal path.
2929

3030
## v1 vs v2.0
3131

32-
| Aspect | v1 (Minimal) | v2.0 (Enterprise) |
33-
|---|---|----|
34-
| Pipeline | S3 → Lambda → SQS → Lambda → S3 | EventBridge / Step Functions + Glue + GE |
35-
| Idempotency | DDB object-level lock| Powertools (DynamoDB TTL) + replay / backfill |
36-
| Recovery | Manual | Replay + DLQ redrive `scripts/.sh` helpers |
37-
| Queryability | S3 only | Glue Catalog / Crawler + Athena |
38-
| Data quality || Glue Job + Great Expectations gate |
39-
| Storage | JSONL → Parquet | Parquet + Glue tables (compaction) + Athena |
40-
| Observability | Logs only | CloudWatch Dashboards + Alarms |
41-
| CI/CD | Local apply | CI + manual Terraform plan/apply (keys/OIDC) |
32+
| Aspect | v1 (Minimal) | v2.0 (Enterprise track) |
33+
|---|---|---|
34+
| Pipeline | S3 → Lambda → SQS → Lambda → S3 | Same + optional workflows |
35+
| Idempotency | DynamoDB object-level | Powertools Idempotency (DDB TTL) |
36+
| Recovery | Basic | Replay + DLQ redrive helpers |
37+
| Storage | JSONL → Parquet | Same (+ optional compaction job) |
38+
| Queryability | S3 only | Optional Glue Catalog/Crawler + Athena |
39+
| Data quality || Optional Glue Job + GE gate |
40+
| Observability | Logs only | Optional CloudWatch dashboards + alarms |
41+
| CI/CD | Local apply | CI + manual Terraform workflow |
4242

4343
## Quickstart
4444

@@ -124,3 +124,27 @@ Recommendation: keep `ge_emit_events_from_transform=false` and `ge_eventbridge_e
124124
## License
125125

126126
MIT — see `LICENSE`.
127+
128+
## Why “optional” is emphasized
129+
130+
This repo is designed so you can keep a minimal, low-cost baseline (the core S3→Lambda→SQS→Lambda→S3 pipeline) and enable “enterprise” capabilities via Terraform toggles. Marking modules as **optional** avoids misunderstandings when:
131+
132+
- You intentionally keep a capability off (cost, complexity, or permissions).
133+
- Your org policies block certain APIs (for example, CloudWatch dashboard/alarm writes).
134+
- You want to demonstrate the architecture and the toggles without implying every deployment has every module enabled.
135+
136+
If you enable a module in your environment, it is valid to describe it as “included in my deployment” in interviews.
137+
138+
## Resume-ready project summary (copy/paste)
139+
140+
**One-liner**
141+
142+
Built a production-lite, serverless ELT framework on AWS (S3 bronze JSONL → Lambda ingest → SQS (+ DLQ) → Lambda transform → S3 silver Parquet) with optional orchestration, catalog/query, and data quality gates.
143+
144+
**Highlights**
145+
146+
- Implemented object-level idempotency using AWS Lambda Powertools Idempotency backed by DynamoDB (conditional writes + TTL) to prevent duplicate ingestion across retries and duplicate events.
147+
- Designed for reliability with SQS partial batch failure handling, DLQ + redrive tooling, and S3-copy replay for backfills without direct queue access.
148+
- Produced query-ready Parquet outputs and integrated optional Glue Data Catalog/Crawler so Athena can query the silver layer as tables.
149+
- Added optional operational workflows (Step Functions) to orchestrate replay/backfill and downstream readiness/quality checks (with optional EventBridge auto-triggering).
150+
- Delivered infrastructure as code (Terraform modules) and CI automation (pytest + terraform fmt checks; manual Terraform plan/apply workflow supporting OIDC or access keys).

0 commit comments

Comments
 (0)