Skip to content

Latest commit

 

History

History
212 lines (176 loc) · 6.95 KB

File metadata and controls

212 lines (176 loc) · 6.95 KB

Changelog

All notable changes to the K2 Reference Data Platform will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.


[1.1.1] - 2026-01-24

Changed - CI/CD Improvements

Separated GitHub Actions Workflows

  • Split CI into two workflows for faster feedback and clearer results:
    • lint.yml - Code quality checks (ruff, black, isort, mypy)
    • test.yml - Unit tests and coverage reporting
  • Benefits:
    • Faster feedback (~2-3 min each vs ~5 min combined)
    • Clearer failure diagnosis (lint vs test failures)
    • Can re-run individually
    • Better CI metrics and insights
  • Updated status badges in README to show both workflows
  • Updated CI-CD.md documentation to reflect new structure

Code Formatting Fixes

  • Formatted 6 files with black (models.py, ingest.py, duckdb_pool.py, conftest.py, test files)
  • All code now passes black formatting checks
  • All code now passes isort import sorting checks

[1.1.0] - 2026-01-23

Added - CI/CD Configuration

GitHub Actions Workflows

  • CI workflow (.github/workflows/ci.yml):
    • Automated linting with ruff
    • Code formatting checks (black + isort)
    • Type checking with mypy
    • Unit test execution with pytest
    • Coverage reporting to Codecov
    • Runs on push to main/develop and all PRs
    • Runtime: ~3-5 minutes
  • Status badges added to README:
    • CI build status
    • Code coverage
    • Python version
    • License
  • Pre-commit hooks configured for local development
  • Dependency caching for faster CI runs

Documentation

  • Created docs/development/CI-CD.md:
    • How to run checks locally
    • Common CI failure scenarios and fixes
    • Troubleshooting guide
    • Best practices for CI/CD
    • Instructions for adding new checks

Changed

  • Updated README.md with CI status badges
  • Enhanced development workflow with automated checks

[1.0.0] - 2026-01-23

Added - Phase 1: Foundation

Phase 1A: Project Foundation (Week 1)

  • Project scaffolding with proper Python package structure
  • pyproject.toml with uv dependency management
  • Comprehensive Makefile for development workflows
  • pytest configuration with markers (unit, integration, e2e, bitemporal, scd2)
  • Pre-commit hooks (black, isort, ruff, mypy)
  • Docker Compose overlay for refdata services
  • 5 Architecture Decision Records (ADRs):
    • ADR-001: Bitemporal Modeling
    • ADR-002: Ingestion Strategy
    • ADR-003: DBT vs Spark
    • ADR-004: Symbology Mapping
    • ADR-005: Schema Evolution

Phase 1B: Bronze Ingestion (Week 2)

  • Configuration management with Pydantic Settings
  • Structured logging with structlog
  • PostgreSQL state store for change detection
  • Base exchange client with rate limiting and retries
  • Binance REST client (/api/v3/exchangeInfo)
  • Kraken REST client (/0/public/AssetPairs)
  • Avro schemas for both exchanges (BACKWARD compatibility)
  • Kafka producer with idempotent publishing
  • CLI commands for ingestion (refdata ingest --source all)
  • Initialization scripts for Iceberg catalog and Schema Registry
  • Comprehensive unit tests (18 tests covering both clients)

Phase 1C: DBT Silver Transformations (Weeks 3-5)

  • DBT project configuration (dbt_project.yml, profiles.yml)
  • Bronze source definitions (models/bronze/sources.yml)
  • Silver instruments model with SCD Type 2 + bitemporal logic
  • Custom macros:
    • normalize_asset() - Handle exchange-specific quirks (XBT→BTC, USDT→USD)
    • bitemporal_scd2() - Reusable SCD Type 2 implementation
  • Gold symbology master model with canonical ID generation
  • Data quality tests (unique, not_null, temporal consistency, no overlaps)
  • Integration tests for DBT transformations
  • Comprehensive DBT documentation:
    • DBT-GUIDE.md (18,000+ words)
    • DBT-QUICKSTART.md (quick reference)
    • DBT-WORKFLOW-DIAGRAM.md (visual diagrams)
    • DBT-EXERCISES.md (10 hands-on exercises)
    • dbt/README.md (project guide)

Phase 1D: API Query Layer (Week 4)

  • DuckDB connection pool (5-50 connections) with Iceberg support
  • Query utilities for bitemporal lookups:
    • query_current_instruments() - List active instruments
    • query_instrument_as_of() - Point-in-time queries
    • query_instrument_history() - Audit trail
    • query_symbology_by_canonical() - Canonical → exchange
    • query_symbology_reverse() - Exchange → canonical
  • Pydantic models for type-safe request/response validation
  • FastAPI middleware stack:
    • Request logging with structured fields
    • Correlation ID tracking
    • Request size limiting (10MB)
    • HTTP caching (5 minutes)
    • CORS support
  • API routers:
    • Instruments router (list, history endpoints)
    • Symbology router (lookup, resolve endpoints)
  • Main FastAPI application with lifespan management
  • Health check endpoint (/health)
  • Auto-generated OpenAPI documentation (/docs, /redoc)
  • Integration tests for all endpoints (14 tests)
  • Comprehensive API documentation (API-GUIDE.md)

Phase 1F: Documentation & Operational Readiness (Week 6)

  • Operational runbooks:
    • MANUAL-OVERRIDE.md - Data correction procedures
    • DEPLOYMENT.md - Production deployment guide
  • DEPLOYMENT-CHECKLIST.md - Comprehensive go-live checklist
  • Phase implementation summaries:
    • PHASE-1D-SUMMARY.md
  • Updated README.md with production roadmap

Technical Specifications

Data Model:

  • Bitemporal tracking (business time + system time)
  • SCD Type 2 with late correction support
  • Canonical instrument IDs (BTC-USD-SPOT)
  • Cross-exchange symbology mapping

Technology Stack:

  • Python 3.11+, FastAPI, DuckDB
  • Apache Iceberg Format Version 2
  • DBT with dbt-duckdb adapter
  • Apache Kafka + Schema Registry (Avro)
  • PostgreSQL (state store)
  • MinIO/S3 (data warehouse)

Performance:

  • API latency: p95 < 100ms, p99 < 200ms
  • Connection pooling: 5-50 concurrent connections
  • HTTP caching: 5 minutes for GET requests
  • Incremental DBT models for efficiency

Data Quality:

  • 15+ DBT tests for temporal consistency
  • Unique constraint validation
  • Not-null validation
  • Custom temporal overlap tests
  • Audit trail for all changes

[Unreleased] - Phase 2: Expansion

Planned Features

  • Add Bybit exchange integration
  • Add Coinbase exchange integration
  • Implement manual override workflow API
  • Grafana dashboards for monitoring
  • Data quality alerting
  • Options and futures support
  • Historical data backfill
  • GraphQL API
  • Real-time change notifications (WebSocket)

Version History

  • 1.0.0 (2026-01-23) - Phase 1 Complete: Foundation established
    • Bronze ingestion (Binance + Kraken)
    • Silver transformations (bitemporal SCD Type 2)
    • Gold symbology mapping
    • FastAPI query layer
    • Complete documentation

Links