Skip to content

Commit 0a319ff

Browse files
committed
docs: refine v1 vs v2 table
1 parent d667dee commit 0a319ff

File tree

1 file changed

+15
-12
lines changed

1 file changed

+15
-12
lines changed

README.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -67,20 +67,24 @@ No VPC/EC2 is required for the minimal path.
6767
This repo is designed so you can keep a minimal, low-cost baseline (the core S3→Lambda→SQS→Lambda→S3 pipeline) and enable “enterprise” capabilities via Terraform toggles. Marking modules as **optional** avoids misunderstandings when:
6868

6969
- You intentionally keep a capability off (cost, complexity, or permissions).
70+
- Your org policies block certain APIs (for example, CloudWatch dashboard/alarm writes).
71+
- You want to demonstrate the architecture and toggles without implying every deployment has every module enabled.
7072

7173
## v1 vs v2.0
7274

73-
| Aspect | v1 (Minimal) | v2.0 (Enterprise track) |
74-
|---|---|----|
75-
| Core pipeline | S3 → Lambda → SQS → Lambda → S3 (Parquet) | Same + production options |
76-
| Orchestration || EventBridge → Step Functions (replay/backfill + ops/DQ stages) |
77-
| Idempotency | DDB object-level lock (`bucket/key#etag`) + TTL | Powertools (DynamoDB + TTL) |
78-
| Recovery | Manual / ad-hoc | Replay scripts + DLQ redrive helpers (repeatable recovery) |
79-
| Queryability | S3 only | Glue Catalog/Crawler → Athena tables on Silver Parquet |
80-
| Data quality || Step Functions → Glue Job (+ Great Expectations gate) |
81-
| Storage / compute | JSONL → Parquet | Parquet + Glue job for compaction/recompute |
82-
| Observability | CloudWatch logs | Powertools logs + metrics; CloudWatch dashboards + alarms |
83-
| CI/CD | Local deploy / manual apply | GitHub Actions CI + Terraform workflow (keys/OIDC) |
75+
| Aspect | v1 (Minimal) | v2.0 (Production-ready / Enterprise track) |
76+
|---|---|---|
77+
| Core pipeline | S3 (bronze JSONL) → Lambda ingest → SQS → Lambda transform → S3 (silver Parquet) | Same core pipeline (keeps it simple & scalable) |
78+
| Triggers & orchestration | S3 trigger + SQS event source mapping | Same, plus optional Step Functions workflows (manual by default; optional EventBridge schedule/auto-trigger) |
79+
| Idempotency | Object-level idempotency (DynamoDB + TTL; key = `s3://bucket/key#etag`) | Powertools Idempotency backed by DynamoDB (conditional writes + TTL; same key) |
80+
| Failure handling | Default retries | SQS partial batch failure handling + DLQ (optional) + replay/redrive scripts (`scripts/replay.sh`, `scripts/redrive.sh`) |
81+
| Recovery / backfill | Manual replay (ad-hoc) | Repeatable replay/backfill loop (S3-based replay + DLQ redrive), designed for safe reprocessing |
82+
| Storage format | JSONL → Parquet | JSONL → Parquet with partitioned silver layout (query-friendly) |
83+
| Queryability | S3 files only | Optional Glue Catalog/Crawler → Athena tables over `silver/<record_type>/dt=.../*.parquet` |
84+
| Data quality || Optional Step Functions task → Glue Job (+ optional Great Expectations gate) |
85+
| Observability | Logs only | Powertools Logger + Metrics + optional CloudWatch Dashboard + Alarms |
86+
| IaC / deployment | Terraform apply locally | Terraform modules + CI checks (pytest + terraform fmt) + manual Terraform plan/apply workflow (OIDC preferred; access keys supported) |
87+
| Extensibility | Manual wiring per dataset | Dataset scaffold (`make scaffold DATASET=...`) generates config/handler/DQ/sample skeletons for new datasets |
8488

8589
## Quickstart
8690

@@ -171,4 +175,3 @@ Recommendation: keep `ge_emit_events_from_transform=false` and `ge_eventbridge_e
171175

172176
MIT — see `LICENSE`.
173177

174-

0 commit comments

Comments
 (0)