Skip to content

Liu-Ming-Yu/alpha-forge

Repository files navigation

English · Current 简体中文


Alpha Forge

Agentic AI infrastructure for systematic trading.

Alpha Forge brings LLM-native research, representation learning, portfolio governance, and IBKR-ready execution into one quiet, auditable operator system.

GitHub stars GitHub forks GitHub issues GitHub last commit GitHub license

Python 3.11 Docker ready IBKR ready FastAPI operator API

Agentic LLM intelligence / representation learning / autonomous research / governed execution

Demo Command Deck Architecture Map LLM Layer Research Factory Execution


Demo

Alpha Forge operator console demo showing live NAV, broker status, readiness, and strategy controls

Live operator console for NAV, broker health, regime state, strategy readiness, and execution controls.


Command Deck

Discover
Explore the product surface.

Showcase
Architecture Map
Technology Stack
Investigate
Open the core intelligence systems.

Agentic LLM Layer
Text-Event-v2
Hybrid ML Engine
Operate
Understand deployment and safety.

V2 Orchestrator
Execution Fortress
Governance Fabric
Open Mission Control — Alpha Forge at a glance
Mission System Response
Convert language into alpha Agentic LLM layer turns narrative into text-event-v2 features
Prevent uncontrolled AI behavior Manifests, evals, startup assertions, audits, and human-review gates
Search across feature space Campaigns, walk-forward validation, IC diagnostics, and bootstrap evidence
Merge many strategies safely V2 account-level orchestrator resolves proposals into one portfolio target
Block unsafe execution Fail-closed broker path, reconciliation, kill switches, and durable journals
Open Alpha Lifecycle — From raw signal to controlled live deployment
Language + Market Data
  -> Agentic Intelligence
  -> Feature Families
  -> Campaign Research
  -> Walk-Forward Validation
  -> Evidence Package
  -> Production-Candidate Gate
  -> Shadow Mode
  -> Paper Trading
  -> Paper Soak
  -> Live Preflight
  -> Controlled Live

Showcase

System Signature Capability
Agentic LLM Intelligence Layer Multi-agent text-event-v2 alpha generation with tool use, manifests, traceable reasoning, evals, live assertions, and human-review gates
Hybrid ML + Representation Learning Engine XGBoost, learned embeddings, PCA representations, formulaic alpha mining, microstructure, ownership, estimates, options, macro, and text-event intelligence
Autonomous Research Factory Campaign-driven alpha discovery, walk-forward validation, IC diagnostics, bootstrap evidence, regime testing, and production-candidate packaging
V2 Shared-Account Orchestrator Multi-engine portfolio target fusion with governed Shadow → Paper → Live promotion
Production Execution Fortress Fail-closed broker execution, durable state, reconciliation, kill switches, strict service boundaries, and automated verification
Governance & Observability Fabric Manifest registry, startup assertions, model provenance, event journals, readiness reports, promotion gates, and operator visibility

Architecture Map

Alpha Forge architecture poster

Explore Architecture Layers — tap into the stack
Layer What It Controls Why It Matters
Language Intelligence Agents, text-event-v2 extraction, manifests, evals, assertions Converts narrative into governed alpha inputs
Research Graph Feature families, campaigns, walk-forward validation, diagnostics Turns hypotheses into reproducible evidence
Portfolio Fusion Multi-engine proposals, account budgets, risk overlays Prevents strategy modules from fighting over the same account
Execution Safety Pre-trade checks, broker adapters, kill switches, reconciliation Stops unsafe action before it reaches the broker
Governance Fabric Readiness reports, promotion gates, event journals, operator visibility Makes AI-native trading auditable and controllable
Open Service Boundary Console — how the monolith stays disciplined
Boundary Rule
core Owns contracts, domain models, and events
application Owns use cases and read models
services Own domain logic behind service boundaries
infrastructure Owns durable adapters and external systems
bootstrap Composes concrete runtime dependencies
cli / views Stay thin and operator-facing

Agentic LLM Intelligence Layer

From language to governed alpha.

The platform treats language as a first-class market data substrate.

Filings, earnings calls, macro commentary, news, transcripts, social narratives, central-bank language, analyst revisions, and event-driven text are converted into structured text-event-v2 features. These features are not uncontrolled LLM outputs. They are manifest-bound, versioned, evaluated, audited, and promotion-gated before they can influence a portfolio.

At the center is a governed multi-agent intelligence layer: a swarm of specialized AI agents that reason over market context, challenge hypotheses, inspect risks, and prepare candidate signals for quantitative validation.

Open Agent Console — specialized intelligence roles
Agent Role
Market Oracle Converts market events, regime changes, and narrative shifts into structured research hypotheses
Sentiment Synthesizer Extracts sentiment, uncertainty, surprise, and directional pressure from noisy language streams
Strategy Debater Runs adversarial critique against candidate alpha ideas before they reach formal research
Risk Guardian Detects overfitting, regime fragility, crowding, stale data, and hidden exposure
Evidence Auditor Checks whether a signal has enough statistical and operational support for promotion
Execution Examiner Reviews whether a strategy can survive cash, liquidity, broker, and order-routing constraints

This is not “prompt engineering.”

It is governed machine reasoning for alpha research: agent outputs are traceable, reproducible, inspected, and connected to downstream evidence.

Text-Event-v2

LLM-native alpha, engineered for production.

The text-event-v2 family turns unstructured language into quantitative features that can be ranked, backtested, stress-tested, and governed like any other alpha source.

It supports:

Open Text-Event Capability Matrix — LLM-native features under governance
Capability Description
Event Extraction Converts raw text into structured market events, catalysts, risks, and directional hypotheses
Narrative Regime Detection Tracks shifts in market language, macro tone, risk appetite, and sector-level narratives
Filing Intelligence Extracts management tone, uncertainty, litigation risk, guidance changes, and operating pressure
Earnings Call Reasoning Identifies surprise, confidence, hedging language, and forward-looking signal changes
Manifest-Governed Features Every LLM-derived feature is tied to a schema, prompt contract, model version, and audit trail
Live Startup Assertions Production startup fails if text models, manifests, or required feature contracts are missing or inconsistent
Promotion Readiness Text-derived signals must survive the same evidence pipeline as price, fundamental, or microstructure features

The result is a language intelligence layer that behaves less like a chatbot and more like a governed research instrument.


Hybrid ML + Representation Learning Engine

Classical alpha, modern ML, and learned structure — in one research graph.

The research engine combines traditional quantitative signals with machine learning and learned representations across the full feature universe.

It integrates:

Open Feature Universe — explicit domain knowledge plus learned structure
Feature / Model Family Purpose
Price & Momentum Features Trend, reversal, volatility, distance-to-high, and cross-sectional ranking signals
Microstructure-v3 Liquidity, spread, volume pressure, intraday behavior, and trading-friction-aware features
Ownership-v1 Institutional behavior, positioning shifts, insider activity, and ownership structure
Estimates-v1 Analyst expectations, revisions, dispersion, and forward-looking earnings pressure
Options-v1 Volatility surface, skew, flow, and implied market expectation signals
Macro-v1 Rate, inflation, liquidity, sector, and broad regime context
Formulaic Alpha Mining Evolutionary search over WorldQuant-style grammars with auto-promotion constraints
Learned Representations-v1 PCA-style embeddings and compressed cross-family structure from existing features
Text-Event-v2 LLM-derived structured features from filings, calls, news, and narrative sources
XGBoost Ranking Pipelines Nonlinear interaction discovery, cross-sectional scoring, and feature attribution

This creates a research system that can discover alpha from both explicit domain knowledge and latent structure hidden across feature families.

The platform does not rely on one model, one signal, or one narrative. It builds an ensemble of evidence.


Autonomous Research Factory

Alpha discovery with memory, discipline, and promotion control.

Quant Platform is designed as an autonomous research factory: generate hypotheses, build features, run experiments, evaluate results, preserve artifacts, and promote only what survives.

The research loop is structured:

Hypothesis -> Feature Family -> Campaign -> Walk-Forward Validation -> Diagnostics -> Evidence Package -> Promotion Decision

Each experiment produces reproducible artifacts: feature manifests, model configs, fold results, IC diagnostics, bootstrap checks, turnover profiles, drawdown curves, and production-candidate reports.

Open Research Diagnostics — how attractive illusions are rejected
Research Capability Description
Campaign-Based Research Experiments are organized as campaigns instead of ad hoc notebooks
Walk-Forward Validation Strategies are tested across rolling out-of-sample folds
IC Diagnostics Signal quality is measured through rank correlation, stability, and negative-streak behavior
Bootstrap Evidence Statistical uncertainty is measured before promotion
Turnover Analysis Execution drag and capacity pressure are treated as first-class constraints
Regime Testing Performance is analyzed across risk-on, risk-off, volatile, and low-liquidity periods
Production-Candidate Reports Strategies are packaged with the evidence needed for governed deployment

The system is built to reject attractive illusions: high-return backtests, fragile edges, accidental leakage, unstable IC, and strategies that look good only before costs.


V2 Shared-Account Orchestrator

Many engines. One account. One coherent portfolio.

Real trading is not a collection of independent strategy demos. Multiple engines compete for the same cash, risk budget, exposure limits, and broker connection.

The V2 Shared-Account Orchestrator solves this by merging independent strategy proposals into a single account-level portfolio target.

Strategy engines can include:

Open Engine Rack — proposal sources for one account-level target
Engine Role
Cross-Sectional Equity Ranker Selects and weights securities based on alpha scores
ETF Macro Allocator Adjusts broad exposure based on macro and regime signals
Risk Overlay Engine Reduces exposure when drawdown, volatility, or signal quality deteriorates
Text-Event Engine Incorporates LLM-derived event signals into portfolio intent
Future Strategy Modules Additional engines can submit proposals through the same governed interface

The orchestrator applies cash constraints, exposure limits, stale-state checks, throttles, kill switches, reconciliation, and broker-readiness checks before any target becomes executable.

Deployment is staged:

Shadow Mode -> Paper Trading -> Paper Soak -> Live Preflight -> Controlled Live

Nothing becomes live by accident. Nothing bypasses governance. Nothing trades without a validated account-level view.


Production Execution Fortress

Designed to stop before it fails.

The execution layer is built around a conservative principle:

If the system cannot prove it is safe to act, it does not act.

Orders pass through a fail-closed execution path with broker abstraction, pre-trade validation, state reconciliation, event journaling, and promotion-state checks.

Open Safety Matrix — proof before action
Safeguard Purpose
Fail-Closed Execution Blocks action when state is stale, invalid, incomplete, or unverifiable
Durable Portfolio State Persists snapshots, parent-child provenance, and account state across restarts
Event Journal Records execution events, promotion events, and operational state transitions
Broker Reconciliation Compares intended portfolio state with broker/account reality
Kill Switches Stops trading when operational or risk conditions become unsafe
Order Throttling Prevents runaway order submission and broker abuse
Paper/Live Separation Keeps simulation, paper, and live paths explicitly separated
IBKR Gateway/TWS Integration Supports broker-backed paper and live execution through controlled adapters

The system is not optimized to appear active. It is optimized to be correct, observable, and controlled.


Governance & Observability Fabric

The control plane for AI-native trading.

Modern AI systems are expected to be observable, testable, and governable, especially when agents use tools and interact with external systems. OpenAI’s Agents, Agents SDK tracing, and human-in-the-loop documentation directly emphasize tools, integrations, observability, guardrails, and human review as core production concepts.

Quant Platform applies those ideas to systematic trading.

Open Governance Console — the control plane for AI-native trading
Governance Layer Function
Feature Registry Tracks all feature families and their schemas
Model Manifest System Records model versions, feature contracts, prompts, configs, and artifacts
Startup Assertions Prevents production startup when required contracts are missing
Readiness Reports Summarizes whether a strategy is eligible for paper or live progression
Promotion Gates Blocks strategies that fail evidence, risk, or operational requirements
Traceable Agent Actions Makes LLM/agent behavior auditable instead of opaque
Human Review Hooks Allows sensitive transitions to require explicit approval
Operator API Provides controlled visibility into system state and readiness

The platform does not trust AI output because it sounds intelligent. It trusts only what can be measured, replayed, audited, and governed.


Technology Stack

Open Stack Matrix — production components behind Alpha Forge
Area Stack
Language & Tooling Python 3.11, uv, pyproject.toml, pre-commit, ruff, mypy, pytest
ML / Representation Learning XGBoost, learned representations, PCA embeddings, feature attribution, rank modeling
LLM / Agentic AI Text-event-v2 pipelines, manifest-governed agents, tool-using research workflows, audit trails
Data & Research Storage Parquet, object-store-ready artifacts, campaign outputs, reproducible experiment persistence
State & Infrastructure PostgreSQL, Alembic, Redis/event-bus patterns, durable event journals
Execution Simulated backend, paper backend, IBKR ibapi, controlled live-adapter pathway
Deployment Docker, docker-compose, Makefile orchestration, FastAPI operator API
Verification make verify, import-boundary checks, module-size guards, type-debt enforcement, regression tests
Governance Readiness reports, production-candidate gates, startup assertions, promotion evidence, paper-soak validation

Operating Surface

The installed console script and module entrypoint are equivalent:

quant-platform --help
python -m quant_platform --help

Setup

Use Python 3.11 for project verification.

Quick start (recommended)

One command creates the venv, installs dependencies, installs ibapi (the IBKR TWS API -- not on PyPI, auto-downloaded from IBKR), and seeds .env from the example:

pwsh scripts/setup.ps1            # add -Extras for the ml + backtest research extras

On macOS/Linux/WSL:

bash scripts/setup.sh             # add --extras for ml + backtest

Then edit .env (set POSTGRES_PASSWORD, QP__API__OPERATOR_API_KEY, and your vendor/broker keys) and start the platform -- the broker-capable API natively (scripts/serve_api_native.ps1) or the full Docker stack (scripts/deploy.ps1). The manual equivalents of each step follow.

Manual setup

python -m venv .venv
.venv\Scripts\activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev,api]"

On macOS/Linux/WSL:

python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev,api]"

Optional extras:

python -m pip install -e ".[ml]"       # XGBoost research
python -m pip install -e ".[backtest]" # vectorbt backtests

ibapi (the IBKR TWS API) is not on PyPI, so no pip extra can provide it. scripts/setup.ps1 installs it automatically; to (re)install it on its own:

pwsh scripts/install_ibapi.ps1            # pinned to API 10.46.1; -Version <ver> to override
bash scripts/install_ibapi.sh             # IBAPI_VERSION=<ver> to override

IBKR rotates hosted builds. If the pinned download 404s, pass a current version from https://interactivebrokers.github.io/ (for example -Version 1047.01).

Optional GPU acceleration for the learned-representations and sequence-ranker paths installs torch via the [dl] extra. PyPI ships CPU-only wheels; install a CUDA build matching your GPU afterward (example: CUDA 12.8 / cu128):

python -m pip install -e ".[dl]"
python -m pip install torch --index-url https://download.pytorch.org/whl/cu128

GPU is optional. Torch-dependent tests skip automatically when CUDA is absent.

Run With Docker

Durable paper/live state lives in Postgres and Redis, which run as Docker services. For a full stack in one command (build the image, start Postgres and Redis, apply migrations, serve the operator API and console) use the deploy script. It generates any missing secrets into .env on first run:

pwsh scripts/deploy.ps1            # add -Workers for the maintenance worker, -Paper for the paper engine, -Rebuild to force a rebuild

On macOS/Linux/WSL:

bash scripts/deploy.sh

To run the platform natively (in your venv) but keep state durable, start only the backing services and point .env at them. POSTGRES_PASSWORD must be set in .env first because compose reads it:

docker compose up -d postgres redis

Live IBKR Broker (Native API)

The Docker image deliberately omits ibapi, so the containerized API cannot reach IBKR. To pull live positions/NAV, run the operator API in your venv (where ibapi is installed) on 127.0.0.1 -- which the TWS API trusts by default, unlike a container's bridge address -- against the Dockerized Postgres + Redis:

pwsh scripts/serve_api_native.ps1
bash scripts/serve_api_native.sh

This ensures ibapi is present, starts Postgres + Redis, stops the Dockerized API to free the port, then serves natively. TWS or IB Gateway must be running and reachable per the .env QP__BROKER__* settings (paper TWS = 7497).

The script also exports QP__LIVE_IBKR__CONTRACTS_FILE from .env into the process environment so broker-sync can map account positions (it reads that var from os.environ directly, not pydantic settings) -- point it at your traded universe, e.g. infra/config/universe_300.json, or the console shows 0 positions. On startup the operator API hydrates cash/NAV from the latest persisted broker snapshot, so the console reflects the real account rather than the synthetic --initial-cash ledger.

Configuration

Settings are loaded by PlatformSettings from .env and environment variables with the QP__ prefix.

Start from:

copy infra\config\settings.example.env .env

Minimal local in-memory development:

set QP__STORAGE__POSTGRES_DSN=
set QP__STORAGE__REDIS_URL=
set QP__STORAGE__EVENT_BUS_BACKEND=in_memory
set QP__BROKER__PAPER_TRADING=true

Durable paper/live requires at least:

QP__STORAGE__POSTGRES_DSN=postgresql+psycopg://user:pass@host:5432/quant_platform
QP__STORAGE__REDIS_URL=redis://localhost:6379/0
QP__STORAGE__EVENT_BUS_BACKEND=redis_streams
QP__API__OPERATOR_API_KEY=<strong random key>

The operator console Strategy screen (strategy runs, engine budgets, source contributions) is populated by the V2 unified runtime. Enable it so single-engine supervise runs as the N=1 multi-engine case and writes those rows (ADR-014); also required by run-multi-engine:

QP__V2__ENABLED=true
QP__V2__ACCOUNT_ORCHESTRATOR_ENABLED=true

Common IBKR ports:

App Mode Port
TWS Paper 7497
TWS Live 7496
IB Gateway Paper 4002
IB Gateway Live 4001

Common Commands

Schema:

python -m quant_platform migrations-check
python -m quant_platform migrate
python -m quant_platform verify-schema

Single paper cycle:

python -m quant_platform run-cycle --initial-cash 50000

Bounded engine runs:

python -m quant_platform run-engine --mode shadow --cycles 5
python -m quant_platform run-engine --mode paper --execution-backend ib-paper --contracts-file infra/config/paper_contracts.json --cycles 1
python -m quant_platform run-engine --mode live --contracts-file ./contracts.json --cycles 1

V2 multi-engine proposal/orchestration path:

python -m quant_platform run-multi-engine ^
  --engines cross_sectional_equity,etf_macro_allocator ^
  --budgets-file ./budgets.json ^
  --mode paper ^
  --contracts-file infra/config/paper_contracts.json ^
  --cycles 1

Data and research:

python -m quant_platform ingest --start YYYY-MM-DD --end YYYY-MM-DD --contracts-file ./contracts.json
python -m quant_platform maintain --interval 900 --contracts-file ./contracts.json
python -m quant_platform compute-features --contracts-file ./contracts.json
python -m quant_platform features backfill --contracts-file ./contracts.json --start YYYY-MM-DDT00:00:00+00:00 --end YYYY-MM-DDT00:00:00+00:00 --feature-set-version paper-alpha-composite-v1 --date-policy nyse-sessions
python -m quant_platform boosting gpu-check
python -m quant_platform research-campaign run --help

Operator API:

python -c "import secrets; print(secrets.token_urlsafe(32))"
set QP__API__OPERATOR_API_KEY=<generated key>
python -m quant_platform serve-api --host 127.0.0.1 --port 8000
curl -H "X-API-Key: %QP__API__OPERATOR_API_KEY%" http://127.0.0.1:8000/health/ready

Verification

Fast local checks:

python scripts/check_import_boundaries.py
python scripts/check_service_coupling.py
python scripts/check_module_size.py
python scripts/check_type_debt.py
python scripts/check_lint_debt.py --skip-ruff-probe
python -m pytest -q tests/unit/test_engine_loop.py

Full offline gate:

make verify

Durable and live gates are opt-in:

set QP_VERIFY_DURABLE=1
set QP_VERIFY_LIVE_IBKR=1
set IBAPI_PACKAGE_PATH=<path-to-TWS-API/source/pythonclient>
make verify

Documentation Map

Start with USEME.md for operator commands and CONTEXT.md for project vocabulary.

Architecture:

Runbooks live under docs/runbooks. Use them for operations, incident response, paper promotion, backups, IBKR recovery, data recovery, and production readiness.

Releases

No releases published

Packages

 
 
 

Contributors