Tarih: 16 Ekim 2025
Owner: @siyahkare
Sprint: S10 - The Real Deal
Status: β
COMPLETED
- Model:
backend/data/models/2025-10-14/lgbm.pkl - Best symlink:
backend/data/models/best_lgbm.pklβ - Model card:
backend/data/models/2025-10-14/model_card.json
- Val Accuracy: 54.11% (mock dataset β expected low for random data)
- Optuna Trials: 50
- Best Iteration: 136
- Features: close, ret1, sma20_gap, sma50_gap, vol_z
- Dataset: ~30 days, ~43K rows (Parquet)
- Path:
backend/data/feature_store/BTCUSDT.parquet - Format: Leak-safe time-based split
- Leak-safe time split (train.ts < val.ts)
- Optuna hyperparameter tuning (β₯50 trials)
- joblib model dump +
best_lgbm.pklsymlink - Thread-safe inference wrapper (
LGBMProd) - EnsemblePredictor LGBM integration
- Tests passing (smoke + leakage guard)
- Dependencies installed (lightgbm, optuna, pyarrow, duckdb)
β
test_time_split_no_leak: PASS
β
test_guard_catches_leakage: PASSβ
test_infer_wrapper_load_predict: PASS
Prediction: 0.5056 (valid range [0.0, 1.0])β
Optuna optimization: 50 trials
β
Best accuracy: 54.67% (trial 37)
β
Final model: 54.11% val accuracyFeature Store (Parquet)
β
Time-based split (14 days val)
β
Optuna (50 trials) β Best params
β
LightGBM training + Early stopping
β
Artifacts:
- lgbm.pkl (joblib)
- model_card.json
- best_lgbm.pkl (symlink)
β
LGBMProd (singleton loader)
β
EnsemblePredictor.lgbm.predict_proba()
- Mock data: Random synthetic data β low accuracy (54%)
- Expected with real data: β₯65% validation accuracy
- Features: Basic technical indicators only
-
Real market data: Use ccxt ingest (Epic-A complete, ready to use)
-
Class imbalance: Add
is_unbalance=Trueorscale_pos_weightif needed -
Feature expansion:
- Trend indicators (EMA cross, MACD)
- Funding rate & Open Interest
- Rolling volatility
- Lagged features (shift 1-5)
- Order flow imbalance
-
Hyperparameter tuning:
- Increase trials to 200+ for production
- Add timeout constraint for nightly runs
- Multi-objective optimization (accuracy + calibration)
-
Model monitoring:
- Drift detection (PSI, KS test)
- Online calibration
- A/B testing framework
- Production TFT: PyTorch Lightning transformer
- Sequence modeling: LSTM/Transformer for temporal patterns
- Inference target: p95 β€ 40ms (CPU)
- Ensemble weights: Dynamic weight adjustment based on recent performance
- Shadow deployment: Run new models in shadow mode
- AutoML nightly: Automatic retraining with fresh data
β
backend/src/data/feature_store.py
β
backend/src/ml/train_lgbm_prod.py
β
backend/src/ml/infer_lgbm.py
β
backend/src/ml/models/ensemble_predictor.py (updated)
β
backend/tests/test_feature_store_leakage.py
β
backend/tests/test_train_lgbm_prod_smoke.py
β
backend/tests/test_infer_lgbm_wrapper.py
β
sprint/EPIC_B_LGBM_GUIDE.md
β
sprint/EPIC_B_LGBM_COMPLETE.md (this file)
β
backend/data/models/2025-10-14/lgbm.pkl
β
backend/data/models/2025-10-14/model_card.json
β
backend/data/models/best_lgbm.pkl
β
backend/data/feature_store/BTCUSDT.parquet
Prepared by: @siyahkare
Completed: 16 Ekim 2025
Status: π COMPLETE β Ready for Epic-C (Production TFT)!