Skip to content

sahilgundu/tier1-uk-bank-fx-streaming-gcp

Tier-1 UK Bank — BARX-style FX Streaming (GCP) · Sanitized Case Study

CI Release Docs License: MIT

SLO p95 <90s DLQ enabled VPC--SC CMEK

A sanitized data-engineering case study demonstrating a real-time FX streaming pipeline on GCP.

What is “BARX-style”? A reference to a well-known FX trading platform at a Tier-1 UK bank.
This repo simulates those event-stream patterns (quotes, trades, fills/confirmations).
No client code or data.

Quick Facts
Use-case: real-time BARX-style FX quotes/trades/confirmations → enrichment → analytics Events: quotes, trades, fills/confirmations; exactly-once via insertId; late data via watermarks
Stack: GCP (Pub/Sub → Dataflow/Beam → BigQuery) + Composer; VPC-SC, CMEK
Throughput: ~6–7 M events/day (~250–400 events/sec); E2E p95 < 90 s
Patterns-only: no client code/data; fully sanitized Ops: markdownlint + pre-commit; protected main; semver releases
SLOs: success ≥99.5%, p95 lat <90s, DLQ <0.5%
Cost guardrails: BQ partition/cluster, Dataflow autoscaling, logs-based budgets

L2 Architecture – BARX-style Real-Time FX Streaming on GCP

Illustrates end-to-end ingestion, validation, enrichment, and analytics flow across GCP services (Pub/Sub, Dataflow, BigQuery, Composer). L2 Architecture – Real-Time FX Streaming Pipeline on GCP

What is FX Streaming?

Real-time ingestion of FX trades/quotes/confirms from OMS/EMS into GCP for validation, enrichment, and analytics with T+0 visibility (risk, P&L, compliance). This repo is a sanitized docs-only case study; no client code or data.

Business KPIs

  • Trade ingestion success ≥99.5%
  • End-to-end p95 latency < 90s (trades → mart)
  • DLQ rate < 0.5% per day
  • Replay SLA < 30 min for P1 incidents
  • Dashboard freshness ≤ 2 min
  • Cost per 1M events (target guardrail)
  • Data quality failures per 10k events
  • Duplicate rate after dedup < 0.1%

Failure Handling & Replay Flow

  1. Detect: Cloud Monitoring alert + Error Reporting signature; failed events land in DLQ.
  2. Triage: Classify (schema drift, late/dup, enrichment miss, transient infra).
  3. Fix: Patch rule/config; ensure idempotency via insertId semantics.
  4. Replay: Composer DAG drains DLQ → re-enqueue → Dataflow reprocess.
  5. Verify: QC queries across raw → stage → mart; close incident with notes.

Scaling Strategy (Throughput Guide)

Peak msg/s Daily events Dataflow autoscale (workers) BigQuery partition/cluster Notes
100 8.6M 4–8 DATE(event_ts); cluster instrument, side baseline
300 26M 12–20 same watch shuffle; stream insert quotas
500 43M 20–30 same consider regionalization; template upgrades

See SLOs & Observability, Security Boundary, and RUNBOOK pages for deeper details.

Operations


Docs Index

Note: Sanitized case study from my Cognizant engagement; patterns only—no client code/data.

TL;DR. Real-time FX event ingestion, validation, enrichment, and analytics on GCP using Pub/Sub → Dataflow (Apache Beam) → BigQuery, orchestrated by Composer, with VPC-SC/CMEK governance. Targets p95 < 90s E2E latency at ~2–2.5M events/day.
Security model: see SECURITY.md.


📁 Repository Overview

  • Real-time ingestion (Pub/Sub → Dataflow → BigQuery)
  • Orchestration with Cloud Composer
  • Governance via VPC-SC & CMEK
  • Observability & cost-control docs

SLO & Cost Summary

Metric Target Monitoring / Notes
E2E latency (p95) < 90s Dataflow metrics (latency, watermark skew)
Delivery success ≥ 99.5% Pub/Sub delivery metrics
DLQ rate < 0.5% of daily volume Dataflow error-handling counters
Daily cost (est.) ~$8–12/day BigQuery slot + Dataflow job cost (GCP free-tier optimized)
Storage footprint ~3–5 GB/day GCS, lifecycle policies applied

Why this architecture

Risk & compliance require T+0 visibility into FX trades. The platform captures trades/quotes, validates & enriches them, and serves curated BigQuery marts and near-real-time dashboards with auditable lineage and low ops overhead.

Key Outcomes

  • Manual interventions ↓ ~95% via strong DQ + DLQ flows
  • Reporting 2× faster, intra-day risk views
  • p95 E2E latency < 90s at peak 350–500 msg/s
  • Infra cost ↓ ~35–40% using BQ partition/cluster + autoscaling

L2 Architecture Overview

flowchart LR
  subgraph Producers
    A["OMS/EMS"] -->|Trades| P1["Pub/Sub: trades"]
    B["Pricing Engine"] -->|Quotes| P2["Pub/Sub: quotes"]
    C["Custodian"] -->|Confirms| P3["Pub/Sub: confirms"]
  end

  subgraph GCP_SecBoundary["GCP Security Boundary (VPC-SC, CMEK, SA-IAM)"]
    DF["Dataflow (Beam)\nvalidate+dedup\nwatermarks/late\nenrich joins\nDLQ routing"]
    BQ["BigQuery\nraw -> stage -> mart\n(partition+cluster)"]
    CMP["Composer\nDAGs: replay, compaction, exports, QC"]
    OBS["Cloud Monitoring & Error Reporting\nSLO p95 < 90s"]

    P1 --> DF
    P2 --> DF
    P3 --> DF
    DF -->|insertId upserts| BQ
    CMP --> BQ
    DF -->|DLQ| DQ["DLQ topics"]
    BQ --> OBS
  end

  BQ --> BI["Power BI / Looker\nDashboards & Alerts"]
Loading

Event Life-cycle (Trades)

sequenceDiagram
  participant Prod as Producer (OMS/EMS)
  participant PS as Pub/Sub (trades)
  participant DF as Dataflow (Beam)
  participant BQ1 as BigQuery (raw_fx_events)
  participant BQ2 as BigQuery (mart_fx_risk_snapshot)
  participant BI as BI/Exports

  Prod->>PS: publish trade event
  PS->>DF: pull message
  DF->>DF: schema validate, dedup (trade_id+version)
  DF->>DF: enrich from ref data (desk limits, KYC)
  alt valid
    DF->>BQ1: write with insertId (exactly-once)
    BQ1->>BQ2: transform (dbt/SQL, partitions & clusters)
    BQ2-->>BI: dashboards / alerts
  else invalid
    DF-->>PS: route to DLQ with reason
  end
Loading

Data Model (curated highlights)

  • fx_trade_fact(trade_id PK, version, side, symbol, qty, px, notional_usd, trader, desk, status, event_ts)
  • fx_limit_breach(desk, symbol, window_notional_usd, limit_usd, breach_flag, breach_ts)
  • fx_risk_snapshot_1min(ts, desk, symbol, notional_usd, pnl)

Partition by trade_date; cluster by desk, symbol. Materialize common aggregates for BI.


SLOs & Observability

  • Delivery success ≥ 99.5%, p95 E2E < 90s, DLQ rate < 0.5%
  • Cloud Monitoring dashboards on latency, backlog, watermark skew; logs‑based alerts to on‑call.

Author’s responsibilities (what I owned)

  • Streaming design on GCP; Beam pipelines for dedup, late-data, enrichment, DLQ.
  • BigQuery modeling (raw → stage → mart), partition/cluster strategy, and cost tuning.
  • Security/governance: VPC‑SC, CMEK, least‑privilege SA IAM; PII masking UDFs.
  • Ops: Composer DAGs for replay/compaction/exports; SLOs, alerts, and runbooks.

Repo Map

tier1-uk-bank-fx-streaming-gcp/
├─ README.md
├─ RUNBOOK.md
├─ SECURITY.md
├─ ETHICS.md
├─ LICENSE
├─ CODEOWNERS
├─ CODE_OF_CONDUCT.md
├─ CONTRIBUTING.md
├─ .pre-commit-config.yaml
├─ .markdownlint.jsonc
├─ .markdownlint-cli2.jsonc
├─ docs/
│  ├─ 01-context.md
│  ├─ 02-architecture-overview.md
│  ├─ 03-sequence-streaming.md
│  ├─ 04-security-boundary.md
│  ├─ 05-data-models.md
│  ├─ 06-slos-observability.md
│  └─ 07-cost-controls.md
├─ contracts/
│  ├─ trades.schema.json
│  ├─ quotes.schema.json
│  └─ confirms.schema.json
├─ qc_examples.sql        # row-count reconciliation example
├─ adr/
│  └─ 0001-record-architecture-decisions.md
└─ 0001-use-pubsub-dataflow-bq.md

Sanitization Note: Public artifacts are generic; client code/data are intentionally excluded. See ETHICS.md.


Reuse this pattern

Minimal commit plan (copy these messages)

  1. docs(readme): add Quick Facts, real CI + Release badges, link SECURITY.md; remove extra horizontal rule under Quick Facts
  2. docs(readme): fix links to diagrams (use .md pages with ```mermaid blocks); add links to ADR and Release Notes
  3. docs: create required stubs (or remove dead links):
    • docs/05-data-models.md
    • docs/06-observability-and-slos.md
    • docs/07-cost-controls.md
    • docs/08-ci-cd.md
    • docs/09-license-notes.md

Option A (rename files): replace any docs/*.mmd with Markdown pages, keeping Mermaid code blocks unchanged.

Commands

# Normalize diagram pages to .md and preserve history
git ls-files 'docs/*.mmd' | while read f; do git mv "$f" "${f%.mmd}.md"; done
git grep -n '\.mmd' -- docs | cut -d: -f1 | sort -u | while read f; do sed -i 's/\.mmd/\.md/g' "$f"; done

# Remove stray placeholder if present
sed -i '/contentReference\[oaicite:0\]{index=0}/d' README.md

FAQ

Is this affiliated with any bank or with Barclays/“BARX”?
No. This is a sanitized case-study for learning and portfolio demonstration. No client code/data, no access to any bank systems.

What does “BARX-style” mean, and why not write the bank name?
“BARX-style” describes the pattern we simulate—real-time FX quotes/trades → Pub/Sub → Dataflow → BigQuery—common in Tier-1 banks. We keep the repo client-agnostic and avoid real client names in public GitHub.

Is any production or proprietary dataset used here?
No. All data is synthetic and generated locally; artifacts are generic.

Can this be used in production as-is?
This is an educational example. Review/harden per your org’s standards and the license in LICENSE.

Where is the sanitization policy?
See the Sanitization Note above and ETHICS.md.

About

Sanitized case study — Tier-1 UK bank FX streaming on GCP (Pub/Sub → Dataflow → BigQuery, Composer, VPC-SC/CMEK). Patterns only; no client code/data.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors