Skip to content

Latest commit

Β 

History

History
167 lines (131 loc) Β· 3.99 KB

File metadata and controls

167 lines (131 loc) Β· 3.99 KB

βœ… Epic-B β€” Production LGBM β€” COMPLETE

Tarih: 16 Ekim 2025
Owner: @siyahkare
Sprint: S10 - The Real Deal
Status: βœ… COMPLETED


🎯 Sonuçlar

Model Artifacts

  • Model: backend/data/models/2025-10-14/lgbm.pkl
  • Best symlink: backend/data/models/best_lgbm.pkl βœ…
  • Model card: backend/data/models/2025-10-14/model_card.json

Training Metrics

  • Val Accuracy: 54.11% (mock dataset β†’ expected low for random data)
  • Optuna Trials: 50
  • Best Iteration: 136
  • Features: close, ret1, sma20_gap, sma50_gap, vol_z

Feature Store

  • Dataset: ~30 days, ~43K rows (Parquet)
  • Path: backend/data/feature_store/BTCUSDT.parquet
  • Format: Leak-safe time-based split

βœ… Definition of Done (All Passed)

  • Leak-safe time split (train.ts < val.ts)
  • Optuna hyperparameter tuning (β‰₯50 trials)
  • joblib model dump + best_lgbm.pkl symlink
  • Thread-safe inference wrapper (LGBMProd)
  • EnsemblePredictor LGBM integration
  • Tests passing (smoke + leakage guard)
  • Dependencies installed (lightgbm, optuna, pyarrow, duckdb)

πŸ§ͺ Testing Results

Leakage Tests

βœ… test_time_split_no_leak: PASS
βœ… test_guard_catches_leakage: PASS

Inference Tests

βœ… test_infer_wrapper_load_predict: PASS
   Prediction: 0.5056 (valid range [0.0, 1.0])

Training Pipeline

βœ… Optuna optimization: 50 trials
βœ… Best accuracy: 54.67% (trial 37)
βœ… Final model: 54.11% val accuracy

πŸ“Š Architecture

Feature Store (Parquet)
         ↓
Time-based split (14 days val)
         ↓
Optuna (50 trials) β†’ Best params
         ↓
LightGBM training + Early stopping
         ↓
Artifacts:
  - lgbm.pkl (joblib)
  - model_card.json
  - best_lgbm.pkl (symlink)
         ↓
LGBMProd (singleton loader)
         ↓
EnsemblePredictor.lgbm.predict_proba()

πŸ“ Notes & Future Improvements

Current Limitations

  • Mock data: Random synthetic data β†’ low accuracy (54%)
  • Expected with real data: β‰₯65% validation accuracy
  • Features: Basic technical indicators only

Recommended Enhancements

  1. Real market data: Use ccxt ingest (Epic-A complete, ready to use)

  2. Class imbalance: Add is_unbalance=True or scale_pos_weight if needed

  3. Feature expansion:

    • Trend indicators (EMA cross, MACD)
    • Funding rate & Open Interest
    • Rolling volatility
    • Lagged features (shift 1-5)
    • Order flow imbalance
  4. Hyperparameter tuning:

    • Increase trials to 200+ for production
    • Add timeout constraint for nightly runs
    • Multi-objective optimization (accuracy + calibration)
  5. Model monitoring:

    • Drift detection (PSI, KS test)
    • Online calibration
    • A/B testing framework

πŸš€ Next Steps

Immediate (Epic-C)

  • Production TFT: PyTorch Lightning transformer
  • Sequence modeling: LSTM/Transformer for temporal patterns
  • Inference target: p95 ≀ 40ms (CPU)

Future (Post Sprint-10)

  • Ensemble weights: Dynamic weight adjustment based on recent performance
  • Shadow deployment: Run new models in shadow mode
  • AutoML nightly: Automatic retraining with fresh data

πŸ“ Deliverables

Code Files

βœ… backend/src/data/feature_store.py
βœ… backend/src/ml/train_lgbm_prod.py
βœ… backend/src/ml/infer_lgbm.py
βœ… backend/src/ml/models/ensemble_predictor.py (updated)

Test Files

βœ… backend/tests/test_feature_store_leakage.py
βœ… backend/tests/test_train_lgbm_prod_smoke.py
βœ… backend/tests/test_infer_lgbm_wrapper.py

Documentation

βœ… sprint/EPIC_B_LGBM_GUIDE.md
βœ… sprint/EPIC_B_LGBM_COMPLETE.md (this file)

Artifacts

βœ… backend/data/models/2025-10-14/lgbm.pkl
βœ… backend/data/models/2025-10-14/model_card.json
βœ… backend/data/models/best_lgbm.pkl
βœ… backend/data/feature_store/BTCUSDT.parquet

Prepared by: @siyahkare
Completed: 16 Ekim 2025
Status: πŸŽ‰ COMPLETE β€” Ready for Epic-C (Production TFT)!