Last updated: 2026-02-20 (UTC)
Workspace: e:\Projects\Python\sktime
Build a separate sktime_quant/ extension that uses sktime for:
- walk-forward backtesting (stocks, indices, commodities)
- forecasting with confidence intervals (95%+ gating)
- TimescaleDB-first ingestion (plus CSV/folder)
- portfolio optimization + risk-aware rebalancing
- offline broker order CSV export
- Streamlit UI
- CLI runner
- tests from the start (unit + integration where feasible)
- Package scope: separate module
sktime_quant/ - Primary DB: TimescaleDB/PostgreSQL
- Optimization objective: risk-adjusted return
- Order format: broker
CSV v1 - Backtest standard: walk-forward via
sktime.split/ model evaluation style - Daily operation: update mode when supported, fallback to refit
sktime_quant/configfor dataclass config + YAML load/save profilessktime_quant/dataconnectors + schema validation + providersktime_quant/featureslagged/exogenous/technical/zone feature toolingsktime_quant/modelsregistry, health checks, metadata/overviewsktime_quant/backtestwalk-forward orchestrationsktime_quant/forecastforecast engine with update/refit handlingsktime_quant/riskrisk metricssktime_quant/portfolioconfidence/risk-aware allocationsktime_quant/executionorder generation/export and constraintssktime_quant/pipelines/orchestrator.pyend-to-end runnersktime_quant/ui/streamlit_app.py(initial UI)sktime_quant/ui/streamlit_app_uplift.py(new redesigned UI)sktime_quant/testsunit/integration-style coverage (non-container paths)
- Incremental ingestion mode with state checkpoint (
last timestamp) - Data quality artifact/report generation
- Exogenous handling:
- one-step lag for regressors
- categorical exogenous encoding
- null-row drops before modeling
- Holiday table integration for Prophet-like usage from DB
- Model registry expansion + dependency/availability health reporting
- Daily update exclusions per model where unsupported
- Ensemble-related support and selection controls
- Model selection rationale artifact + tie-break behavior support
- Governance artifacts:
- per-run governance report
- stability history over runs (
results/governance/model_stability_history.json)
- Execution realism:
- no-trade band
- min/max notional constraints
- lot size constraints
- per-asset turnover caps
- Broker order CSV validation + deterministic sorting
- CLI runner support (
python -m sktime_quant.run --config ...)
- CSV/folder support enhanced to accept Yahoo-style raw OHLCV patterns
- External ingestion helper code exists under:
sktime_quant/Ingest-outside-code/symbols.infsktime_quant/Ingest-outside-code/yahoo_downloader.py
- Timescale pipeline direction established (including OHLCV storage + optional exogenous tables)
- Recommendation accepted: keep DB raw timestamps; apply one-day lag in modeling pipeline
- Added run history, profile load/save, advanced config YAML editor
- Added model picker and model overviews
- Added run progress events and artifact browsing
- Added per-run visualization but still accumulated UX complexity
- reports perceived as mixed/merged
- weak run-state visibility during execution
- non-intuitive controls (radio-driven, "bookish" JSON-heavy outputs)
- Arrow serialization issues in compare views with mixed dtypes
- model selection behavior confusing when unavailable models were grayed out
- Separate app, same backend logic
- Tabs:
Run StudioRun Explorer(strict single-run scope)GovernanceOrdersConfig Lab
- cleaner styling, clearer operator workflow, reduced report mixing risk
- Unit tests substantially expanded around:
- schema, features, model registry health/params
- backtest leakage boundaries
- forecast interval/confidence behavior
- execution CSV schema/compliance
- Smoke tests executed for orchestration paths
- Docs generation scoped for quant docs area; Sphinx installed and used
- Non-integration test runs reported healthy in prior cycle
- Profiles:
profiles/quant_profile.yamlprofiles/quant_profile2.yamlprofiles/en_profile.yaml
- Example generated artifacts:
results/reports/*_summary.jsonresults/reports/*_model_selection.jsonresults/reports/*_model_governance.jsonresults/reports/*_data_quality.jsonresults/orders/orders_YYYYMMDD.csvresults/governance/model_stability_history.json
- Avoid downgrading core libraries to satisfy optional models
- Ignore unstable/unmaintained models if dependency risk is high (example: TBATS)
- Keep warning noise reduced ("warning floods")
- Focus on daily update practicality and explicit exclusion reporting
- Keep UI informative for model choice and run status/progress
- Keep broker execution offline via validated file exports
- Uplift UI file created and compile-checked:
sktime_quant/ui/streamlit_app_uplift.py
- Existing UI remains available:
sktime_quant/ui/streamlit_app.py
- Backend pipeline logic and artifact generation already integrated
- Project is in advanced prototype/initial-version-plus stage with strong scaffolding and iterative hardening completed across multiple areas
- Make
streamlit_app_uplift.pythe default UI entry in docs/scripts. - Add explicit run lifecycle persistence (
queued/running/completed/failed) as a small run registry file. - [IMPLEMENTED] Add Performance Analytics tab with risk metrics visualization (Sortino, Sharpe, Calmar, drawdown curves).
- Add UI smoke tests (basic render + artifact parsing guards).
- Harden Arrow-safe dataframe conversion in all compare/report tables.
- Add Timescale containerized integration test path into CI (if not already wired in active pipeline).
- Prepare release notes and semantic tag for current milestone.
-
New tab: Performance Analytics in uplift UI (3rd tab in main navigation)
-
Risk metrics wired: Sortino, Sharpe, Calmar, max drawdown
-
Data source: Integrates with walkforward backtest output (
fold_predictions.parquet) -
Storage specification (VERIFIED):
- All backtest artifacts stored under
{output_dir}/backtests/{run_id}/ - Fold predictions (per-fold trade data):
results/backtests/{run_id}/fold_predictions.parquet - Metrics (per-asset/model aggregates):
results/backtests/{run_id}/metrics.parquet - Orchestrator path construction verified (line 64-65 in orchestrator.py)
- Documented at line 523-525 with explicit comments
- All backtest artifacts stored under
-
Aggregation: Sums daily fold_returns across all assets/models for portfolio P&L view
-
Visualizations: Equity curve, rolling Sortino (30-day), drawdown underwater plot
-
Export: Save computed metrics to JSON per run with null-safe serialization
-
Error messaging: Improved with debugging guidance (path substitution, common causes)
-
Loader robustness (PRODUCTION-READY):
- Attempts fold_predictions.parquet first (per-fold detail)
- Falls back to metrics.parquet if fold_predictions is empty/missing
- Gracefully handles excluded models (when all asset-model combinations fail filters)
- Better error messaging with debugging guidance
-
Functions added:
_load_backtest_results()- loads fold_predictions.parquet with fallback to metrics_compute_risk_metrics()- calculates Sortino/Sharpe/Calmar/drawdown on aggregated returns_plot_equity_curve(),_plot_rolling_sortino(),_plot_drawdown()- Plotly visualizationssortino_ratio()- annualized downside volatility-adjusted returnsharpe_ratio()- risk-free rate-adjusted return per unit volatilitycalmar_ratio()- annual return divided by absolute max drawdowncumulative_returns()- total return from start to end equity
-
Test coverage (8 integration tests, all PASSED):
test_risk_metrics_basics()- existing VaR/CVaR/volatility/drawdowntest_walkforward_produces_fold_returns_for_metrics()- verifies fold_predictions structuretest_risk_metrics_on_walkforward_returns()- full backtest->aggregate->metrics pipelinetest_sortino_ratio_financial_interpretation()- realistic strategy scenariotest_sharpe_ratio_risk_adjusted_return()- multi-strategy comparisontest_calmar_ratio_return_over_drawdown()- drawdown impact testingtest_cumulative_returns_end_to_end_calculation()- return math validationtest_aggregate_daily_portfolio_returns_from_folds()- multi-asset daily aggregation (UI workflow)
- Backtests folder:
results/backtests/(segregated by run_id, not date) - Forecasts folder:
results/forecasts/(date-based for daily batches:YYYYMMDD) - Reports folder:
results/reports/(per-run summary artifacts) - Reason: Backtests are historical; forecasts are operational daily; reports are run-scoped
- No overwrites: Each run_id gets clean folder; daily forecast dates handle multiple runs same day
- Converted from pure math unit tests to integration tests
- Matches project's testing pattern (see
test_walkforward.py,test_orders.py) - Tests actual business workflows: data -> walkforward -> aggregate -> metrics
- Uses realistic DataFrames (market data, fold predictions)
- Tests edge cases AND financial interpretation
- All 8 tests pass; framework now aligned with project standards
- Run CLI:
python -m sktime_quant.run --config profiles/quant_profile.yaml
- Run uplift UI:
streamlit run sktime_quant/ui/streamlit_app_uplift.py
- Run original UI:
streamlit run sktime_quant/ui/streamlit_app.py
- Environment: project virtualenv
.\.venv\Scripts\python.exe - Syntax validation:
python -m py_compile sktime_quant/ui/streamlit_app_uplift.py-> pass
- Targeted tests:
.\.venv\Scripts\python.exe -m pytest sktime_quant/tests/test_risk_metrics.py -q-> 8 passed.\.venv\Scripts\python.exe -m pytest sktime_quant/tests/test_risk_metrics.py sktime_quant/tests/test_forecast_engine_update.py -q-> 9 passed
- Full quant test suite:
.\.venv\Scripts\python.exe -m pytest sktime_quant/tests -q-> 57 passed, 1 skipped
- Realignment applied:
- removed non-ASCII/emoji strings from uplift UI/status text
- removed unused
plotly.expressimport in uplift UI - normalized non-ASCII characters in this context document and risk-metrics test comment
- Makefile targets:
make test_quant_coremake test_quant
- CI workflow update:
.github/workflows/test_quant.ymlnow includes an explicit core regression step:test_walkforward.pytest_risk_metrics.pytest_forecast_engine_update.py
- Latest local validation:
.\.venv\Scripts\python.exe -m pytest sktime_quant/tests/test_walkforward.py sktime_quant/tests/test_risk_metrics.py sktime_quant/tests/test_forecast_engine_update.py -o addopts="" -q-> 13 passed.\.venv\Scripts\python.exe -m pytest sktime_quant/tests -o addopts="" -m "not integration" -q-> 57 passed, 1 deselected
- Freeze snapshot created before changes:
- branch:
freeze/20260220_101657 - tag:
freeze-20260220_101657 - implementation branch:
reeng/uplift-studio-bg-20260220_101657
- branch:
- Requirement intake validation documented:
docs/quant/uplift_requirement_intake_validation.md
- Background-run runtime added for Studio UX:
sktime_quant/pipelines/studio_runtime.py- persisted run registry:
results/reports/run_registry.json - lifecycle states:
queued,running,completed,no_new_data,failed
- Run-level report artifact added:
sktime_quant/reporting/run_report.py- orchestrator now emits
*_report.mdand includesreport_pathin summary/CLI output
- Uplift UI adjusted to foreground Studio + background execution:
- queue background run
- refresh/poll run registry
- run/event visibility in Run Studio
- Timescale container integration harness added:
sktime_quant/tests/integration/timescale_container/docker-compose.ymlsktime_quant/tests/integration/timescale_container/init.sql- test:
sktime_quant/tests/test_timescale_container_integration.py
- Targeted re-engineering tests:
.\.venv\Scripts\python.exe -m pytest sktime_quant/tests/test_orchestrator_integration.py sktime_quant/tests/test_cli_runner.py sktime_quant/tests/test_run_report.py sktime_quant/tests/test_studio_runtime.py -o addopts="" -q-> 10 passed
- Non-integration quant suite:
.\.venv\Scripts\python.exe -m pytest sktime_quant/tests -o addopts="" -m "not integration" -q-> 60 passed, 2 deselected
- Integration batch (Docker + Timescale):
test_timescale_integration.py+test_orchestrator_integration.py-> 6 passedtest_timescale_container_integration.py-> 1 passed- combined integration batch -> 7 passed
- Logs/report:
TEST_RESULTS.mdupdated with consolidated integration sectionresults/reports/integration_batch_1.logresults/reports/integration_batch_2.log
- Promote uplift UI as default entry path in docs/scripts.
- Add CI job for containerized Timescale integration (gated on Docker availability).
- Add UI smoke tests for Studio lifecycle + registry rendering.
- Next-level strategy scope (was planned at this checkpoint):
- rule-chaining visual builder (UI + config serialization)
- classifier engine (tree/RF) over indicators + internals
- blended execution path wiring rule engine and classifier outputs
- Prepare release notes + semantic milestone tag for this uplift implementation.
- Implemented complete next-level strategy scope:
- rule DSL with validation and YAML persistence:
sktime_quant/strategy/rule_dsl.py
- classifier-based signal engine (decision tree / random forest):
sktime_quant/strategy/classifier.py
- blend policy engine (
and,or,weighted_vote):sktime_quant/strategy/blender.py
- strategy package exports:
sktime_quant/strategy/__init__.py
- rule DSL with validation and YAML persistence:
- Config surface extended:
- added
StrategyConfigtoAppConfigwith mode/rules/classifier/blend parameters - file:
sktime_quant/config/schema.py
- added
- Walk-forward integration:
WalkForwardEngine.run(..., strategy_config=...)- supports modes:
forecast_only,rule_only,classifier_only,blended - per-fold columns persisted:
signal_forecast,signal_rule,signal_classifier,signal_blended,classifier_confidence
- metrics now carry strategy metadata (
strategy_mode,blend_policy,classifier_type) - file:
sktime_quant/backtest/walkforward.py
- Orchestration/reporting integration:
- strategy config artifact:
*_strategy_config.json - strategy rules artifact:
*_strategy_rules.yaml - summary includes strategy fields/path references
- run report includes strategy metadata
- files:
sktime_quant/pipelines/orchestrator.pysktime_quant/reporting/run_report.py
- strategy config artifact:
- Studio UX integration:
- Run Studio includes Strategy controls
- new Rule Builder tab with YAML load/apply/save
- file:
sktime_quant/ui/streamlit_app_uplift.py
- Targeted suite:
.\.venv\Scripts\python.exe -m pytest sktime_quant/tests/test_strategy_engine.py sktime_quant/tests/test_walkforward.py sktime_quant/tests/test_orchestrator_integration.py sktime_quant/tests/test_cli_runner.py -o addopts="" -q- result:
15 passed
- Full non-integration quant suite:
.\.venv\Scripts\python.exe -m pytest sktime_quant/tests -o addopts="" -m "not integration" -q- result:
64 passed, 2 deselected
- Timescale container integration smoke:
RUN_TIMESCALE_CONTAINER_TESTS=1 .\.venv\Scripts\python.exe -m pytest sktime_quant/tests/test_timescale_container_integration.py -o addopts="" -q- result:
1 passed
- Completed:
- Freeze + re-engineering branch/tag checkpointing
- Requirement intake validation against
Uplift.md - Background run lifecycle runtime + registry
- Run-level markdown reporting
- Timescale container integration harness + tests
- Pending item 4 full implementation (rule DSL, classifier engine, blended strategy, Studio integration)
- Pending:
- Promote uplift UI as default in docs/scripts
- Add CI wiring for containerized Timescale integration
- Add Studio UI smoke tests
- Add classifier calibration report and strategy-mode comparison artifact
- Add visual rule-chain form builder (beyond YAML editor)
- Prepare release notes and release push flow for new tags