sahilgundu
diff --git a/‎.editorconfig‎
Lines changed: 8 additions & 0 deletions b/‎.editorconfig‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎.markdownlint-cli2.jsonc‎
Lines changed: 3 additions & 0 deletions b/‎.markdownlint-cli2.jsonc‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎.markdownlint.jsonc‎
Lines changed: 3 additions & 0 deletions b/‎.markdownlint.jsonc‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 5 additions & 0 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎CODEOWNERS‎
Lines changed: 4 additions & 0 deletions b/‎CODEOWNERS‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎CODE_OF_CONDUCT.md‎
Lines changed: 6 additions & 0 deletions b/‎CODE_OF_CONDUCT.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 9 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎ETHICS.md‎
Lines changed: 16 additions & 0 deletions b/‎ETHICS.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 213 additions & 0 deletions b/‎README.md‎
Lines changed: 213 additions & 0 deletions
@@ -0,0 +1,8 @@
+root = true
+
+[*]
+charset = utf-8
+end_of_line = lf
+insert_final_newline = true
+indent_style = space
+indent_size = 2
@@ -0,0 +1,3 @@
+{
+  "default": true
+}
@@ -0,0 +1,3 @@
+{
+  "default": true
+}
@@ -0,0 +1,5 @@
+repos:
+  - repo: https://github.com/markdownlint/markdownlint
+    rev: v0.13.0
+    hooks:
+      - id: markdownlint
@@ -0,0 +1,4 @@
+# CODEOWNERS – for illustration only
+
+*   @your-github-handle
+docs/*  @your-github-handle
@@ -0,0 +1,6 @@
+# CODE OF CONDUCT
+
+This is a small, sanitized portfolio repository.
+
+- Be respectful when opening issues or discussing ideas.
+- No harassment, hate speech, or abusive behaviour.
@@ -0,0 +1,9 @@
+# CONTRIBUTING
+
+This repository is primarily a **docs-only case study**.
+
+If you want to extend it:
+
+1. Open an issue describing the improvement.
+2. Follow the existing docs structure in the `docs/` folder.
+3. Keep all examples **fully sanitized** – no real client data or secrets.
@@ -0,0 +1,16 @@
+# ETHICS.md – Sanitization & Responsible Use
+
+This repository is a **sanitized case study**.
+
+- No real client code or client data is included.
+- Bank names, volumes, and SLOs are illustrative.
+- JSON schemas and table names are synthetic.
+
+The patterns shown here – Streaming Pipelines, Batch Pipelines, ETL/ELT, and ML governance –
+are intended for **learning, interview discussions, and portfolio demonstration only**.
+
+When using similar patterns in a real environment:
+
+- Follow your organisation's security, privacy, and model risk policies.
+- Do not expose PII or confidential business metrics.
+- Engage risk, legal, and compliance teams before deploying ML-based risk scoring to production.
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,213 @@
+# ML-Based Risk Scoring – Tier-1 UK Retail Bank (GCP + BigQuery ML)
+
+> Sanitized case study — ML-based Risk Scoring for a Tier-1 UK Retail Bank on GCP  
+> (Streaming Pipeline + Batch Pipeline using Pub/Sub · Dataflow · BigQuery · BigQuery ML · Cloud Composer · GCS).  
+> Patterns only; no client code or client data.
+
+---
+
+## 🔍 Quick Facts
+
+- **Domain:** Retail Banking · BFSI · Fraud & Credit Risk · Audit/Compliance  
+- **Pipelines:** Streaming Pipeline (ETL) + Batch Pipeline (ELT) on GCP  
+- **Stack:** Cloud Pub/Sub, Dataflow (Apache Beam – Python), BigQuery, BigQuery ML, Cloud Composer, GCS, Power BI / Looker Studio  
+- **Throughput (simulated):**  
+  - ~50–100 transactions per second (steady)  
+  - Up to ~5–10 million transactions per day  
+- **SLOs (simulated):**
+  - p95 end-to-end risk-score latency: **< 90 seconds** from transaction to score  
+  - Data Quality (DQ) pass rate: **≥ 95%** for reportable features and risk scores  
+  - Streaming Pipeline availability: **≥ 99.5%**
+
+---
+
+## 1. What this project is about
+
+This project shows how a **Tier-1 UK Retail Bank** could implement an **ML-Based Risk Scoring platform on GCP** using a mix of:
+
+- **Streaming Pipeline (ETL Pipeline)** for near-real-time ingestion and scoring of transactions  
+- **Batch Pipeline (ELT Pipeline)** for daily aggregates, model training, and re-scoring
+
+The goal is to:
+
+- Continuously ingest **card & account transactions + customer behaviour events**
+- Build/maintain **feature tables** in BigQuery
+- Train **BigQuery ML** models for fraud risk / credit risk
+- Generate **risk scores** that are **auditable**, **governed**, and easy to consume by downstream systems and dashboards
+
+The repo is **docs-only** (no client data, no production code).  
+It focuses on architecture, contracts, DQ, SLOs, ML governance, and operational patterns.
+
+---
+
+## 2. Inputs and outputs
+
+### 2.1 Inputs (simulated)
+
+1. **Transactional events**  
+   - Card payments, ATM withdrawals, online banking transactions  
+   - Ingested via **Cloud Pub/Sub** – topic: `transactions.realtime`  
+   - Payload schema: `contracts/transactions.schema.json`  
+
+2. **Customer & account attributes**  
+   - Static/dimensional data (KYC, limits, risk bands)  
+   - Landed as **Batch Pipeline** loads into BigQuery staging tables  
+   - Schema: `contracts/customers.schema.json`  
+
+3. **Behavioural / device events (optional)**  
+   - Login attempts, device fingerprints, channel usage  
+   - Either ingested through a separate Pub/Sub topic or batch tables  
+
+### 2.2 Outputs
+
+1. **Feature tables (BigQuery)**  
+   - Streaming + Batch ETL/ELT pipelines create curated **feature tables**:  
+     - `bq_feats.transaction_features`  
+     - `bq_feats.customer_features`  
+   - Partitioned by **event_date**, clustered by **customer_id / account_id**  
+
+2. **Risk score tables (BigQuery ML predictions)**  
+   - `bq_scores.transaction_risk_scores`  
+   - Columns: transaction_id, customer_id, model_version, risk_score, risk_band, decision_flags, metadata  
+   - CMEK-encrypted, row-level access controls (RLS) for teams  
+
+3. **Aggregated risk views for dashboards**  
+   - `bq_marts.daily_risk_summary`  
+   - Used by **Power BI / Looker Studio** for operational risk monitoring  
+
+4. **Audit & DQ evidence**  
+   - DQ run results with **run_id**, **rules_passed/failed**, and **DQ score**  
+   - DLQ tables/topics for rejected messages with replay capability
+
+---
+
+## 3. High-level business logic (simplified)
+
+1. **Ingest** every transaction in near real time through a **Streaming Pipeline (ETL)**.  
+2. **Enrich** transaction events with customer/account attributes and historical aggregates.  
+3. **Engineer features** (per-customer, per-card, per-device) in BigQuery.  
+4. **Train** ML models (fraud/credit risk) using **BigQuery ML** on daily snapshots via the **Batch Pipeline (ELT)**.  
+5. **Score** new transactions:
+   - Streaming path: low-latency scoring using latest approved model  
+   - Batch path: end-of-day/offline re-scoring or challenger models  
+6. **Serve** risk scores to downstream systems (decision engines, case management tools, dashboards).  
+7. **Govern** everything with **CMEK, VPC-SC, IAM/RBAC, Policy Tags, RLS/CLS, and full lineage**.
+
+---
+
+## 4. Architecture diagram (L2 – GCP components)
+
+> Final PNG committed as `assets/architecture_l2.png`.  
+> Mermaid version kept here for readability.
+
+```mermaid
+flowchart LR
+    subgraph VPC_SC[VPC-SC Protected Boundary]
+        TX[Client Channels\n(Card, ATM, Online)]
+        PUB[Cloud Pub/Sub\ntransactions.realtime]
+        DF_STREAM[Dataflow\nStreaming Pipeline (ETL)]
+        BQ_RAW[BigQuery\nraw_transactions]
+        BQ_FEAT[BigQuery\nfeature tables]
+        BQ_ML[BigQuery ML\nmodels]
+        DF_BATCH[Dataflow\nBatch Pipeline (ELT)]
+        COMP[Cloud Composer\n(Orchestration)]
+        GCS[GCS\nModel & DQ Artifacts]
+        BQ_SCORES[BigQuery\nrisk_scores tables]
+    end
+
+    TX --> PUB
+    PUB --> DF_STREAM
+    DF_STREAM --> BQ_RAW
+    DF_STREAM --> BQ_FEAT
+
+    COMP --> DF_BATCH
+    DF_BATCH --> BQ_FEAT
+    DF_BATCH --> BQ_ML
+    BQ_ML --> BQ_SCORES
+
+    BQ_SCORES -->|BI / Ops| BI[(Power BI / Looker Studio)]
+    BQ_SCORES --> DOWNSTREAM[(Downstream\nRisk Engines)]
+
+    BQ_FEAT --> GCS
+    BQ_ML --> GCS
+```
+
+---
+
+## 5. Dataflow / lifecycle diagram – from transaction to ML
+
+```mermaid
+sequenceDiagram
+    participant Channel as Channel (POS/ATM/Online)
+    participant PubSub as Cloud Pub/Sub
+    participant DFStream as Dataflow\nStreaming Pipeline (ETL)
+    participant BQRaw as BigQuery\nraw_transactions
+    participant BQFeat as BigQuery\nfeature tables
+    participant Composer as Cloud Composer
+    participant BQML as BigQuery ML
+    participant BQScores as BigQuery\nrisk_scores
+    participant BI as Dashboards / Risk Ops
+
+    Channel->>PubSub: Publish transaction event
+    PubSub->>DFStream: Push message
+    DFStream->>DFStream: Validate + ETL transforms\n(schema, enrichment, DQ checks)
+    DFStream->>BQRaw: Insert raw record (partitioned)
+    DFStream->>BQFeat: Update streaming feature tables
+    DFStream-->>BQScores: (optional) Low-latency scoring call
+
+    Composer->>BQRaw: Nightly Batch Pipeline (ELT) query
+    Composer->>BQFeat: Build training features
+    Composer->>BQML: Train / retrain model\n(tag model_version)
+    Composer->>BQScores: Batch scoring jobs
+
+    BQScores-->>BI: Risk dashboards, alerts, queues
+```
+
+---
+
+## 6. Docs index
+
+Detailed documentation lives under `/docs`:
+
+- `docs/01-context-and-usecase.md`
+- `docs/02-architecture-overview.md`
+- `docs/03-streaming-pipeline-event-flow.md`
+- `docs/04-batch-pipeline-elt-and-ml-training.md`
+- `docs/05-data-models-and-feature-store.md`
+- `docs/06-data-quality-and-risk-metrics.md`
+- `docs/07-security-and-governance.md`
+- `docs/08-lineage-and-auditability.md`
+- `docs/09-slos-observability-and-dashboards.md`
+- `docs/10-cost-and-scaling-guardrails.md`
+- `docs/11-ml-governance-and-model-risk.md`
+- `docs/12-roadmap-and-future-work.md`
+
+---
+
+## 7. Repository map
+
+```text
+- README.md
+- RUNBOOK.md
+- SECURITY.md
+- ETHICS.md
+- LICENSE
+- CODEOWNERS
+- CODE_OF_CONDUCT.md
+- CONTRIBUTING.md
+- .pre-commit-config.yaml
+- .markdownlint.jsonc
+- .markdownlint-cli2.jsonc
+- .editorconfig
+- docs/
+- contracts/
+- adr/
+- assets/
+- qc_examples.sql
+```
+
+---
+
+## 8. Status
+
+This is a **documentation-only** case study designed for LinkedIn, GitHub, and portfolio review.