📈 NEPSE AI Trading Research Platform

An advisory-only quantitative research platform for the Nepal Stock Exchange (NEPSE) — combining data-quality assurance, technical analysis, realistic backtesting, and explainable ML signal fusion.

Important

This project is research-only. It does not provide financial advice, guarantee profit, or execute live trades. All outputs are advisory and require human review. There is no live broker execution and no autonomous trading.

🔍 Overview

The NEPSE AI Trading Research Platform is a comprehensive quantitative research system built specifically for the Nepal Stock Exchange. It delivers an end-to-end workflow — from raw market-data ingestion through strategy development, realistic backtesting, and advisory signal generation — designed for individual traders, finance students, and small quant teams.

NEPSE is a small, illiquid market where data quality and execution assumptions are critical. This platform addresses that by enforcing trust-score data-quality gates, modelling realistic transaction costs, and making every signal explainable.

MVP Success Targets

These are research/evaluation gates — not guarantees of trading profit.

Metric	Target
Sharpe Ratio	> 1.2
Maximum Drawdown	< 20%
Win Rate	> 55%
Data Trust (90% of symbols)	≥ 0.7
Feature Freshness	< 24h

✨ Key Features

Multi-source data ingestion — ShareSansar, MeroLagani scraper with manual CSV fallbacks.
Data-quality gating — automated trust scoring (completeness, consistency, freshness, volume, cross-source) with NORMAL / DEGRADED / SAFE_MODE system states.
Realistic backtesting — fees (0.5%), slippage (5 bps), liquidity filters, partial fills, execution delay, stop-loss / take-profit / trailing-stop exits, and benchmark comparison.
Explainable signal fusion — combines technical, ML, and sentiment signals with calibrated confidence and feature attribution (SHAP).
Machine learning suite — baseline models (logistic / random forest / XGBoost), LSTM forecasting, XLM-R sentiment, and an experimental meta-learning research module (RL/GNN planned, not yet implemented).
MLOps & governance — model registry, drift monitoring, automated retraining, and human-approval promotion gates.
15-page Streamlit dashboard + a notebook-driven research workflow and a full REST API with auto-generated Swagger docs.

🏗 Architecture

Data Sources → Ingestion → Validation → Database → Feature Engineering
  → Data Quality → Trust Scoring → Backtesting → Dashboard / API / Alerts

The system is organized into three layers:

User interfaces — Streamlit dashboard, Jupyter notebooks, and the FastAPI REST API.
Backend services — ingestion, data quality, feature engineering, backtesting, signal fusion, risk management, ML training/inference, and MLflow tracking.
Data & storage — PostgreSQL (optionally TimescaleDB), Redis cache/queue, and disk artifacts for models, datasets, and backtest results.

See docs/ARCHITECTURE.md for the full system design and data model.

🧰 Tech Stack

Layer	Technologies
Backend & API	Python 3.12, FastAPI, Uvicorn, Pydantic v2
Database & ORM	PostgreSQL 13+ (TimescaleDB optional), SQLAlchemy 2.0, Alembic
Cache & Queue	Redis
ML & Tracking	scikit-learn, XGBoost, PyTorch (LSTM), MLflow, NumPy/Pandas
Dashboard & Research	Streamlit, Jupyter, Plotly
DevOps	Docker & Docker Compose, GitHub Actions, Kubernetes (`infra/k8s/`)
Quality	pytest, ruff, mypy

🚀 Getting Started

Prerequisites

Python 3.12
PostgreSQL 13+ and Redis (or use Docker Compose, which provisions both)
Git

Installation

**Windows (PowerShell)**

# 1. Clone the repository
git clone https://github.com/Aashish-po/Nepse-AI-Trading-System.git
cd "Nepse-AI-Trading-System"

# 2. Create and activate a virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# 3. Install dependencies
python -m pip install -U pip
pip install -r requirements-dev.txt

# 4. Create your environment file
copy .env.example .env

**macOS / Linux (bash)**

# 1. Clone the repository
git clone https://github.com/Aashish-po/Nepse-AI-Trading-System.git
cd Nepse-AI-Trading-System

# 2. Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Install dependencies
python -m pip install -U pip
pip install -r requirements-dev.txt

# 4. Create your environment file
cp .env.example .env

Configuration

Copy .env.example to .env and fill in the required values. Key variables:

Variable	Description
`DATABASE_URL`	PostgreSQL connection string
`REDIS_URL`	Redis connection string
`JWT_SECRET_KEY`	Secret for JWT auth (min. 32 chars)
`MLFLOW_TRACKING_URI`	MLflow tracking server URI

Warning

Never commit your .env file or any secrets. See Documents/7_Security.md for the full secrets-management policy.

Running with Docker

The fastest way to bring up the full stack (PostgreSQL, Redis, API, dashboard, MLflow):

docker compose -f infra/docker-compose.yml up --build

📖 Usage

Run the API

uvicorn backend.app.main:app --host 127.0.0.1 --port 8000 --app-dir .

Health check: http://127.0.0.1:8000/health
Interactive Swagger UI: http://127.0.0.1:8000/docs

Run the Dashboard

pip install -r dashboard/requirements-dashboard.txt
streamlit run dashboard/app.py

The dashboard runs at http://localhost:8501 with 15 pages: Market Overview, Strategies, Backtesting, Signals, Live Signals, Features, Data Sources, Alerts, System Status, ML Models, Analytics, MLOps, Explainability, Paper Trading, and Factor Analysis.

Seed Symbol Data

python scripts/seed_symbols.py

Populates the database with NEPSE stock symbols for initial testing.

Scrape & Ingest Market Data

Price data is scraped independently of the backend and ingested from CSV:

python nepse_data/scraper.py --today          # scrape ShareSansar + Merolagani -> nepse_data/data/
# then ingest the scraped CSVs into the prices table:
curl -X POST "http://localhost:8000/market/ingest/batch?symbol=NABIL" \
     -H "Content-Type: application/json" -d '{"source": "csv_ingestion"}'

The scraper writes one CSV per source per day to nepse_data/data/{sharesansar,merolagani}/YYYY-MM-DD.csv. CsvIngestionService (backend/app/services/csv_ingestion.py) reads those files, normalizes both sources to OHLCV (Merolagani floor-sheet ticks are aggregated per symbol), merges them on (symbol, date), and upserts into prices. Symbols must already exist in stocks (seed first); unresolved symbols are skipped and reported. Omit symbol from the request to ingest all symbols.

Database Migrations

alembic upgrade head                              # apply latest schema
alembic revision --autogenerate -m "description"  # create a new migration
alembic downgrade -1                              # roll back one revision

Research Notebooks

The platform supports a notebook-driven research cycle:

1. Idea → 2. Notebook Experiment → 3. Backtest → 4. Strategy Integration → 5. Dashboard/Report

Start with research/notebooks/ (01_idea_to_backtest.ipynb, 02_backtest_and_export.ipynb, 03_integrate_bundle.ipynb). All experimental results must be validated through the tested backtesting pipeline before integration.

🔌 API Reference

The REST API exposes auth, market data, features, data quality, strategies/backtests, signals, ML, portfolio, explainability, governance, MLOps, and analytics routes. Full interactive docs are available at /docs when the API is running.

**View full endpoint table**

Method	Endpoint	Description
GET	`/health`	Health check (status, environment, version, scope)
GET	`/health/live`	Liveness probe (process up)
GET	`/health/ready`	Readiness probe (database reachable)
GET	`/metrics`	Prometheus metrics (request counts, latency histogram)
POST	`/auth/register`	Register a new user
POST	`/auth/login`	Login and receive access token
GET	`/market/prices`	List price data with optional symbol filter
POST	`/market/ingest`	Ingest a single price record for a stock
POST	`/market/ingest/batch`	Batch ingest OHLCV data for a date range
POST	`/features/generate`	Compute features for a single symbol/date
POST	`/features/generate-batch`	Compute features for a date range (single symbol)
POST	`/features/generate-multi`	Compute features across multiple symbols
GET	`/data-quality/trust/{symbol}/{date}`	Get trust score and quality details
GET	`/data-quality/safe/{symbol}/{date}`	Check if data is safe to use (trust >= 0.7)
GET	`/data-quality/summary/{symbol}`	Symbol quality summary (avg trust, unsafe days, issues)
POST	`/data-quality/reports/daily`	Generate daily data quality report
GET	`/data-quality/alerts`	List data quality alerts
POST	`/data-quality/alerts/{alert_id}/acknowledge`	Acknowledge an alert
GET	`/data-quality/trends/{symbol}`	Trust score trend over 30 days
GET	`/data-quality/freshness/{symbol}/{date}`	Check data freshness (last update vs expected)
GET	`/data-quality/system-mode`	Get system mode (NORMAL / DEGRADED / SAFE_MODE)
GET	`/data-quality/cross-validate/{symbol}/{date}`	Cross-validate price across active data sources
GET	`/data-quality/source-accuracy/{source_id}`	Accuracy score for a data source
GET	`/data-quality/weighted-price/{symbol}/{date}`	Source-weighted average price
GET	`/data-quality/source-drift/{source_id}`	Detect drift in a data source's record volume
GET	`/data-quality/mode-history`	System mode history
POST	`/data-quality/sources/recover-blacklisted`	Attempt to recover blacklisted sources
POST	`/data-quality/trust/apply-decay`	Apply time-based decay to old trust scores
POST	`/strategies/`	Create a new strategy
GET	`/strategies/`	List all strategies
GET	`/strategies/{strategy_id}`	Get strategy details
POST	`/strategies/backtests`	Run backtest for a strategy
GET	`/strategies/backtests/{backtest_id}`	Get backtest results
POST	`/strategies/benchmarks/compare`	Compare strategy vs buy-and-hold / NEPSE
POST	`/ml/train`	Train an ML model
GET	`/ml/models`	List trained models
GET	`/ml/predict/{symbol}`	Get model prediction for a symbol
POST	`/portfolio/account`	Create/reset a portfolio account
GET	`/portfolio/account/snapshot`	Portfolio snapshot (equity, cash, positions)
POST	`/portfolio/optimize`	Optimize allocation (equal / risk-parity / mean-variance)
GET	`/explain/models/{model_id}/importance`	Global feature importance (SHAP + fallbacks)
POST	`/explain/models/{model_id}/predict`	Local attribution + trade explanation
GET	`/governance/models`	List models by governance state
POST	`/governance/models/{model_id}/submit`	Submit a model for approval
POST	`/governance/models/{model_id}/approve`	Approve a model
POST	`/governance/models/{model_id}/production`	Mark an approved model production-ready
GET	`/mlops/champion`	Select best model by metric
GET	`/mlops/rank`	Rank registered models by metric
GET	`/mlops/models/{model_id}/retrain-assessment`	Assess whether a model needs retraining
POST	`/mlops/retrain`	Trigger a retraining run
POST	`/mlops/evolve`	Evolutionary hyperparameter search
GET	`/analytics/market-overview`	Market overview with top gainers/losers
GET	`/analytics/signals`	Signal explorer with filters and summary
POST	`/analytics/portfolio`	Portfolio analytics from an equity curve
POST	`/alerts/evaluate`	Evaluate rule-based alerts

📂 Project Structure

**View directory tree**

.
├── .github/workflows/        # CI (ci.yml), image publish (docker-publish.yml), Pages
├── AGENTS.md                 # Common dev commands
├── pyproject.toml            # Project + ruff/mypy config
├── requirements.txt          # Runtime dependencies
├── requirements-dev.txt      # Dev/test dependencies
├── alembic.ini               # Alembic config
├── pytest.ini                # pytest config
│
├── backend/
│   ├── app/
│   │   ├── main.py           # FastAPI application entry point
│   │   ├── core/             # config, logging, security, dependencies
│   │   ├── api/routes/       # auth, health, market, features, data_quality, strategies,
│   │   │                     #   signals, ml, lstm, portfolio, explainability, mlops, analytics
│   │   ├── models/           # SQLAlchemy ORM models
│   │   ├── schemas/          # Pydantic request/response contracts
│   │   ├── services/         # Business logic (ingestion, backtest, signal_fusion, ...)
│   │   └── db/migrations/    # Alembic revisions 0001 … 0011
│   └── tests/                # pytest suite (incl. phase gates)
│
├── ml/                       # ML / research modules (training, inference, lstm, sentiment, meta_learning, ...)
├── strategies/               # Strategy definitions and experiments
├── backtesting/              # Backtesting helpers
├── features/                 # Technical indicators (features/indicators.py)
├── scripts/                  # seed_symbols.py, smoke_test.py, backup_db.sh
│
├── dashboard/                # Streamlit dashboard (single app, 15 pages)
├── research/notebooks/       # Idea → backtest → integrate notebook workflow
├── docs/                     # Reference specs (ARCHITECTURE, PHASES, SUCCESS_METRICS, ...)
├── infra/                    # docker-compose.yml, docker/, k8s/
└── models/                   # Trained model artifacts *.joblib (gitignored)

🗺 Implementation Roadmap

Phases 0–13 are implemented in code; Phase 14 (experimental AI) is partially present — ml/meta_learning.py and the sentiment module exist, while RL/GNN (PPO, DQN, GNN, ensembles) are not yet implemented. Phase numbering matches the phase-gated tests (test_phase6_validation.py, test_phase8_gate.py, test_phase10_integration.py).

Phase	Focus Area	Key Deliverables
0	Foundation	Project structure, env config, CI, `/health` endpoint
1	Backend & Database	FastAPI skeleton, SQLAlchemy ORM, Alembic migrations, JWT auth/RBAC, symbol seeding
2	Data Ingestion & Quality	Ingestion service, validation rules, trust scoring, data-quality gate, alerting
3	Feature Engineering	Technical indicators (RSI/SMA/EMA/MACD/ATR), returns/volatility, point-in-time feature store
4	Data Quality & Reliability	Trust model, daily reports, quality alerts, system modes (NORMAL/DEGRADED/SAFE_MODE)
5	Strategy & Backtesting	Strategy registry, realistic backtesting, benchmark comparison
6	Research Workflow & Dashboard	Streamlit dashboard, notebook workflow, export validation
7	Baseline ML	Logistic regression, random forest, XGBoost, walk-forward training, promotion gates
8	LSTM & Sentiment	LSTM next-day forecasting, XLM-R/lexicon sentiment, NEPSE market calendar
9	Signal Fusion & Risk	Signal fusion engine, risk manager, position sizing (advisory; `enforced: false`)
10	Portfolio Optimization	In-memory account simulation, allocation methods (equal/risk-parity/mean-variance)
11	Explainability (SHAP)	Feature importance, local attribution, trade explanations, model governance
12	MLOps / Monitoring / Retraining	Model selection, auto-retraining, hyperparameter evolution, drift monitoring
13	Production Hardening & Deployment	Docker images, Kubernetes manifests, CI/CD, Prometheus monitoring, DB backups
14	Experimental AI	Meta-learning research module (`ml/meta_learning.py`); PPO/DQN/GNN/ensembles not yet implemented

Phase 14 RL/GNN modules are planned experimental research code and not yet implemented. Live broker execution and autonomous trading remain out of scope.

✅ Testing & Quality

# Run the full test suite
python -m pytest backend/tests/ -v

# Lint
python -m ruff check backend/

# Type check
mypy backend/ --ignore-missing-imports --explicit-package-bases

The CI pipeline (.github/workflows/ci.yml) runs ruff, the full pytest suite, import smoke tests, and mypy on every push and pull request to main and develop.

🤝 Contributing

Contributions are welcome! To get started:

Fork the repository and create a feature branch from main (e.g. git checkout -b feature/your-feature).
Set up your environment (see Installation).
Make your changes, following the existing code style.

Validate locally before pushing:

python -m ruff check backend/
python -m pytest backend/tests/ -v

Commit with a clear message and open a pull request against main, describing the change and linking any related issues.

Please ensure all tests pass and lint is clean — CI must be green before a PR can be merged. Common development commands are documented in AGENTS.md.

🛡 Research Boundary

This platform is intentionally limited to ingestion, data quality, features, realistic backtesting, strategy research, dashboards, and advisory outputs. The following are explicitly out of scope:

❌ Live broker execution or order placement
❌ Autonomous / automated trading
❌ Financial advice or guaranteed returns
❌ Options, derivatives, or forex (equities only)

All advisory outputs require human review. See docs/RISK_DISCLAIMER.md for the full disclaimer.

📜 License

This project is licensed under the MIT License — see the LICENSE file for details.

👤 Maintainer

Aashish Paudel

GitHub: @Aashish-po
Repository: Nepse-AI-Trading-System

Built for the NEPSE quantitative-research community. ⭐ Star the repo if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 301 Commits
.github		.github
.vscode		.vscode
backend		backend
dashboard		dashboard
docs		docs
features		features
infra		infra
ml		ml
models		models
research		research
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Nepse AI Trading System.code-workspace		Nepse AI Trading System.code-workspace
README.md		README.md
SECURITY.md		SECURITY.md
alembic.ini		alembic.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

📈 NEPSE AI Trading Research Platform

📑 Table of Contents

🔍 Overview

MVP Success Targets

✨ Key Features

🏗 Architecture

🧰 Tech Stack

🚀 Getting Started

Prerequisites

Installation

Configuration

Running with Docker

📖 Usage

Run the API

Run the Dashboard

Seed Symbol Data

Scrape & Ingest Market Data

Database Migrations

Research Notebooks

🔌 API Reference

📂 Project Structure

🗺 Implementation Roadmap

✅ Testing & Quality

🤝 Contributing

🛡 Research Boundary

📜 License

👤 Maintainer

Aashish Paudel

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages