All notable changes to the K2 Reference Data Platform will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Split CI into two workflows for faster feedback and clearer results:
lint.yml- Code quality checks (ruff, black, isort, mypy)test.yml- Unit tests and coverage reporting
- Benefits:
- Faster feedback (~2-3 min each vs ~5 min combined)
- Clearer failure diagnosis (lint vs test failures)
- Can re-run individually
- Better CI metrics and insights
- Updated status badges in README to show both workflows
- Updated CI-CD.md documentation to reflect new structure
- Formatted 6 files with black (models.py, ingest.py, duckdb_pool.py, conftest.py, test files)
- All code now passes black formatting checks
- All code now passes isort import sorting checks
- CI workflow (
.github/workflows/ci.yml):- Automated linting with ruff
- Code formatting checks (black + isort)
- Type checking with mypy
- Unit test execution with pytest
- Coverage reporting to Codecov
- Runs on push to main/develop and all PRs
- Runtime: ~3-5 minutes
- Status badges added to README:
- CI build status
- Code coverage
- Python version
- License
- Pre-commit hooks configured for local development
- Dependency caching for faster CI runs
- Created
docs/development/CI-CD.md:- How to run checks locally
- Common CI failure scenarios and fixes
- Troubleshooting guide
- Best practices for CI/CD
- Instructions for adding new checks
- Updated README.md with CI status badges
- Enhanced development workflow with automated checks
- Project scaffolding with proper Python package structure
- pyproject.toml with uv dependency management
- Comprehensive Makefile for development workflows
- pytest configuration with markers (unit, integration, e2e, bitemporal, scd2)
- Pre-commit hooks (black, isort, ruff, mypy)
- Docker Compose overlay for refdata services
- 5 Architecture Decision Records (ADRs):
- ADR-001: Bitemporal Modeling
- ADR-002: Ingestion Strategy
- ADR-003: DBT vs Spark
- ADR-004: Symbology Mapping
- ADR-005: Schema Evolution
- Configuration management with Pydantic Settings
- Structured logging with structlog
- PostgreSQL state store for change detection
- Base exchange client with rate limiting and retries
- Binance REST client (
/api/v3/exchangeInfo) - Kraken REST client (
/0/public/AssetPairs) - Avro schemas for both exchanges (BACKWARD compatibility)
- Kafka producer with idempotent publishing
- CLI commands for ingestion (
refdata ingest --source all) - Initialization scripts for Iceberg catalog and Schema Registry
- Comprehensive unit tests (18 tests covering both clients)
- DBT project configuration (
dbt_project.yml,profiles.yml) - Bronze source definitions (
models/bronze/sources.yml) - Silver instruments model with SCD Type 2 + bitemporal logic
- Custom macros:
normalize_asset()- Handle exchange-specific quirks (XBT→BTC, USDT→USD)bitemporal_scd2()- Reusable SCD Type 2 implementation
- Gold symbology master model with canonical ID generation
- Data quality tests (unique, not_null, temporal consistency, no overlaps)
- Integration tests for DBT transformations
- Comprehensive DBT documentation:
- DBT-GUIDE.md (18,000+ words)
- DBT-QUICKSTART.md (quick reference)
- DBT-WORKFLOW-DIAGRAM.md (visual diagrams)
- DBT-EXERCISES.md (10 hands-on exercises)
- dbt/README.md (project guide)
- DuckDB connection pool (5-50 connections) with Iceberg support
- Query utilities for bitemporal lookups:
query_current_instruments()- List active instrumentsquery_instrument_as_of()- Point-in-time queriesquery_instrument_history()- Audit trailquery_symbology_by_canonical()- Canonical → exchangequery_symbology_reverse()- Exchange → canonical
- Pydantic models for type-safe request/response validation
- FastAPI middleware stack:
- Request logging with structured fields
- Correlation ID tracking
- Request size limiting (10MB)
- HTTP caching (5 minutes)
- CORS support
- API routers:
- Instruments router (list, history endpoints)
- Symbology router (lookup, resolve endpoints)
- Main FastAPI application with lifespan management
- Health check endpoint (
/health) - Auto-generated OpenAPI documentation (
/docs,/redoc) - Integration tests for all endpoints (14 tests)
- Comprehensive API documentation (API-GUIDE.md)
- Operational runbooks:
- MANUAL-OVERRIDE.md - Data correction procedures
- DEPLOYMENT.md - Production deployment guide
- DEPLOYMENT-CHECKLIST.md - Comprehensive go-live checklist
- Phase implementation summaries:
- PHASE-1D-SUMMARY.md
- Updated README.md with production roadmap
Data Model:
- Bitemporal tracking (business time + system time)
- SCD Type 2 with late correction support
- Canonical instrument IDs (
BTC-USD-SPOT) - Cross-exchange symbology mapping
Technology Stack:
- Python 3.11+, FastAPI, DuckDB
- Apache Iceberg Format Version 2
- DBT with dbt-duckdb adapter
- Apache Kafka + Schema Registry (Avro)
- PostgreSQL (state store)
- MinIO/S3 (data warehouse)
Performance:
- API latency: p95 < 100ms, p99 < 200ms
- Connection pooling: 5-50 concurrent connections
- HTTP caching: 5 minutes for GET requests
- Incremental DBT models for efficiency
Data Quality:
- 15+ DBT tests for temporal consistency
- Unique constraint validation
- Not-null validation
- Custom temporal overlap tests
- Audit trail for all changes
- Add Bybit exchange integration
- Add Coinbase exchange integration
- Implement manual override workflow API
- Grafana dashboards for monitoring
- Data quality alerting
- Options and futures support
- Historical data backfill
- GraphQL API
- Real-time change notifications (WebSocket)
- 1.0.0 (2026-01-23) - Phase 1 Complete: Foundation established
- Bronze ingestion (Binance + Kraken)
- Silver transformations (bitemporal SCD Type 2)
- Gold symbology mapping
- FastAPI query layer
- Complete documentation
- Repository: https://github.com/k2/k2-reference-data-platform
- Documentation: https://docs.k2.com/refdata
- Issues: https://github.com/k2/k2-reference-data-platform/issues
- API: https://refdata-api.k2.com (production)