Skip to content

spirituslab/causal-sentiment

Repository files navigation

Causal Sentiment Engine

A second brain for macro analysis — built with AI doing most of the heavy lifting.

Financial markets are deeply interconnected. A Fed rate decision ripples through Treasury yields, credit spreads, corporate bonds, and equity valuations. Most analysts hold this causal map in their heads — implicit, fragile, and impossible to stress-test. What if we made it explicit, visual, and interactive?

Causal Sentiment Engine is a 3D interactive causal graph with 52 macro nodes and 117 directed edges, powered by an AI agent that fetches real-time data, analyzes sentiment, and propagates impact through the network. Simulate "what if" shocks — drag oil sentiment to -0.8 and watch the cascade through energy, inflation, rates, and equities. Pin your own reasoning to nodes. The tool remembers what you think, not just what the data says.

This entire full-stack application — Python backend, Next.js frontend, 3D WebGL visualization, PostgreSQL with TimescaleDB, WebSocket real-time sync, a multi-phase AI agent with 12 tools — was built with the help of Claude Code. The velocity of what's possible with AI-assisted development is staggering.

Is it perfect? Not even close. The agent's reasoning could be sharper. The clustering and sentiment propagation models are simplistic. There's a whole world of scenario analysis and market stress testing that could be layered on. This is a prototype — a proof of concept showing how AI + good visualization can expand how we think about macro causation.

Disclaimer: This is a prototype built for fun to bring an idea to life. The sentiment analysis results are experimental, may contain errors and bugs, and should NOT be used for actual trading or investment decisions. Use at your own risk.

This project is a work in progress! If you have ideas for new features, better propagation models, or want to contribute — let's push the boundaries of what AI-augmented analysis tools can look like. Open an issue or submit a PR.

Technical Manual — deep dive into every algorithm, formula, and design decision.

3D Graph Visualization Claude + GPT Docker CI


Why This Exists

Traditional sentiment analysis tools treat assets in isolation — one stock, one headline, one data point at a time. But a change in CPI expectations propagates through monetary policy, into interest rates, through credit markets, and ultimately into equity indices. These cascading causal relationships are what macro investors like Bridgewater model systematically.

This project captures that interconnectedness in a directed causal graph and makes it tangible through 3D visualization. Most macro analysts hold this causal model in their heads. This project makes it explicit, visual, and machine-augmented.

Shock propagation as a thinking tool

The graph isn't just a pretty visualization — it's a reasoning framework. When you shock a node (say, oil sentiment drops to -0.8), the impact propagates through the causal network:

  • Direct hits — energy stocks take an immediate hit (1 hop, high weight)
  • Indirect effects — inflation expectations shift → rate expectations → equities (3 hops, decayed)
  • Inverse beneficiaries — airlines and consumers gain from lower energy costs (negative causal edge)
  • Uncorrelated nodes — geopolitical risk is causally distant, impact decays to near-zero

This is exactly how a macro strategist thinks about positioning: "if X happens, what gets hurt, what benefits, and what's uncorrelated?" The graph makes that reasoning visible and testable.


Features

  • 52-node causal factor graph — macro, rates, commodities, equities, currencies, and more, connected by 117 directed causal edges with expert-defined + dynamically-adapted weights
  • 3D interactive visualization — WebGL-powered, sentiment-colored, with directional particles showing causal flow
  • What-if simulator — shock any node, watch the cascade, see the full impact report
  • AI agent — Claude or GPT with pre-fetched data package (FRED + yfinance + RSS injected before analysis), three-phase reasoning loop (Plan → Analyze → Validate), batch sentiment updates, and self-calibration. All 52 nodes have mapped data sources — no blind spots
  • RSS news pipeline — 27 curated financial RSS feeds (no API key needed): Fed, Bloomberg, CNBC, Google News topics. Enhanced keyword matching with word boundaries, exclusions, and confidence scoring. Source reliability tiers (T1 wire → T3 blog) inform agent reasoning
  • Morning brief — daily intelligence summary: overnight movers (>1σ), prediction scorecard, regime shifts, risk propagation paths, LLM-generated narrative
  • Automation toggles — runtime control of background scheduler and morning brief from the UI (no restart needed)
  • News trending detection — auto-triggers agent analysis when 3+ sources converge on the same topic
  • Analyst annotations — pin timestamped notes to any node, persisted across sessions
  • Regime narrator — LLM-generated macro narrative from bellwether indicators
  • Prediction tracking — agent records falsifiable predictions, system auto-resolves and tracks hit rate
  • Time travel — replay graph state over the past 7 days
  • Portfolio overlay — add your positions, see them highlighted on the graph
  • Scenario engine ("Macro Sim") — strategic foresight tool: click "Generate Scenario" to produce 2-3 probability-weighted branching scenarios with causal chains, historical calibration, and multi-node shocks. 4-phase multi-agent pipeline (Researcher → Historian → Strategist → Mapper) with economic calendar, options positioning data, and current market context. The agent thinks freely first (unconstrained by the graph), then maps impacts to nodes and suggests new nodes/edges for gaps. Non-linear shock model with stress multiplier for simultaneous shocks. Compare 2 branches side-by-side. Chain follow-up scenarios from branch outcomes. 12-domain news scan for topic diversity. Export as JSON.
  • Edge discovery — AI suggests new causal edges from correlation patterns
  • Causal discovery module — computationally discovers causal networks from data using PCMCI+, Granger, and RPCMCI algorithms, with DoWhy statistical validation (see Causal Discovery below)

For details on algorithms, formulas, and design rationale, see the Technical Manual.


Architecture

Browser (localhost:3000)
  ├── 3D Force-Directed Graph (Three.js/WebGL)
  ├── Node Detail Panel + What-If Simulator
  ├── Agent Audit Log + Predictions Panel
  └── Analyst Annotations + Regime Narrator
       │
       │ WebSocket (real-time push)
       ▼
FastAPI Backend (localhost:8000)
  ├── AI Agent (Claude/GPT, 3-phase loop, 12 tools)
  ├── Graph Engine (NetworkX, propagation, anomalies, regimes)
  ├── Data Pipeline (APScheduler, 10 jobs, disabled by default)
  └── PostgreSQL + TimescaleDB + Redis

External Data Sources:
  FRED · yfinance · 30 RSS feeds · NewsAPI · Reddit · SEC EDGAR

For detailed architecture, agent design, and concurrency model, see Technical Manual §2, §8, §17.

Tech Stack

Layer Technology
Backend Python 3.12, FastAPI, Uvicorn
AI Agent Anthropic SDK + OpenAI SDK (switchable)
Graph Engine NetworkX + NumPy/SciPy
Database PostgreSQL 16 + TimescaleDB
Cache Redis
Frontend Next.js 15, React 19
3D Visualization react-force-graph-3d (Three.js/WebGL)
State Zustand
Charts TradingView Lightweight Charts
Deployment Docker Compose

Data Sources

Source Data Frequency
FRED API Rates, CPI, GDP, unemployment, credit spreads, consumer confidence, wages (16 series) Every 4h
yfinance Equities, ETFs, commodities, forex, volatility indices, bond ETFs (21 tickers) Every 1h
RSS Feeds 30 curated financial feeds (Fed, Bloomberg, CNBC, Google News topics) — free, no API key Every 2h
NewsAPI Headlines and articles (optional fallback) On agent trigger
Reddit Social sentiment (r/wallstreetbets, r/economics, r/stocks) Every 2h
SEC EDGAR Earnings, financial filings Daily

Quick Start

Prerequisites

  • Docker and Docker Compose
  • At least one LLM API key (Anthropic or OpenAI)

Setup

  1. Clone the repo:

    git clone https://github.com/zianwang123/causal-sentiment.git
    cd causal-sentiment
  2. Create your .env file:

    cp .env.example .env
  3. Add your API keys to .env:

    # At least one LLM key is required
    ANTHROPIC_API_KEY=sk-ant-...
    OPENAI_API_KEY=sk-proj-...
    
    # Choose default provider: "anthropic" or "openai"
    LLM_PROVIDER=openai
    
    # Scheduler (disabled by default to save API credits)
    SCHEDULER_ENABLED=false
    
    # Optional data sources (mock data used if missing)
    FRED_API_KEY=
    NEWSAPI_KEY=
    
    # Optional Reddit integration (free at reddit.com/prefs/apps)
    REDDIT_CLIENT_ID=
    REDDIT_CLIENT_SECRET=

    Note: News works out of the box via 30 curated RSS feeds (no API key needed). Without FRED_API_KEY, the agent uses mock data for FRED macro series — add your own for real data (free: FRED). NEWSAPI_KEY is optional — RSS feeds are the primary news source, NewsAPI is a fallback.

  4. First-time install:

    ./setup.sh
  5. Start everything:

    ./start.sh
  6. Open http://localhost:3000 — everything runs locally. Click Run Full Analysis to trigger the first agent run.

  7. To shut down:

    ./stop.sh

    Alternatively, run the full stack via Docker:

    docker compose up --build

Usage

Action How
Simulate a shock Click a node → drag the "What-If Shock" slider → click "Simulate"
Add a note Click a node → scroll to "Analyst Notes" → type your reasoning → click "Add"
Regime narrative Click the regime badge (top-left) → click "Generate Narrative"
Run analysis Click "Run Full Analysis" (all 52 nodes)
Deep dive Click a node → click "Deep Dive" for focused single-node analysis
Find a node Open "Nodes" in the bottom toolbar — search, sort, click to fly
Morning brief Click "Morning Brief" in the bottom toolbar → click "Generate"
Toggle automations In the top-left panel under "Automations" — flip scheduler and morning brief on/off
Switch LLM Toggle GPT/Claude in the top-left panel
Time travel Open "Time Travel" in the bottom toolbar
Portfolio Open "Portfolio" in the bottom toolbar to add positions
Scenario engine Click "Scenario Engine" (top-right) → "Generate Scenario" → view branching scenarios → "Apply to Graph" to see cascade
Topology suggestions Open "Evolve Graph" to see AI-suggested new causal edges

Project Structure

causal-sentiment/
├── backend/
│   ├── app/
│   │   ├── main.py
│   │   ├── config.py
│   │   ├── agent/
│   │   │   ├── orchestrator.py
│   │   │   ├── llm_client.py
│   │   │   ├── tools.py
│   │   │   ├── schemas.py
│   │   │   └── prompts.py
│   │   ├── graph_engine/
│   │   │   ├── topology.py
│   │   │   ├── propagation.py
│   │   │   ├── weights.py
│   │   │   ├── correlations.py
│   │   │   ├── anomalies.py
│   │   │   ├── regimes.py
│   │   │   ├── backtest.py
│   │   │   ├── predictions.py
│   │   │   └── topology_learning.py
│   │   ├── data_pipeline/
│   │   │   ├── fred.py
│   │   │   ├── market.py          # + options positioning (IV, put/call, term structure)
│   │   │   ├── calendar.py        # Economic calendar (FOMC, CPI, NFP, GDP + FRED API)
│   │   │   ├── reddit.py
│   │   │   ├── edgar.py
│   │   │   ├── retry.py
│   │   │   └── scheduler.py
│   │   ├── api/
│   │   │   ├── routes_graph.py
│   │   │   ├── routes_agent.py
│   │   │   ├── routes_portfolio.py
│   │   │   └── websocket.py
│   │   ├── db/
│   │   │   └── connection.py
│   │   └── models/
│   │       ├── graph.py
│   │       └── observations.py
│   ├── tests/
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/
│   └── src/
│       ├── app/page.tsx
│       ├── components/
│       ├── hooks/
│       ├── lib/
│       └── types/
├── docs/
│   └── TECHNICAL_MANUAL.md
├── setup.sh
├── start.sh
├── stop.sh
├── docker-compose.yml
└── .env.example

For detailed file descriptions, see Technical Manual.

FAQ

Q: How accurate is the sentiment analysis? A: This is experimental. The agent's quality depends on the LLM, data freshness, and graph structure. The built-in prediction tracking measures accuracy over time. Treat it as a research tool, not a trading signal.

Q: Does this cost money to run? A: LLM API calls cost money. A full 52-node analysis typically uses ~15 tool calls and 20-40K tokens. Market data (yfinance, 21 tickers) is free. FRED requires a free API key (mock data fallback without it). Background jobs are disabled by default to prevent unexpected costs.

Q: Can I add my own nodes and edges? A: Edit backend/app/graph_engine/topology.py. The topology learning feature can also suggest new edges from correlation patterns.

Q: Why not use LangChain / CrewAI / other framework? A: Simplicity and transparency. The agent is ~200 lines in orchestrator.py. Every tool call is logged and visible in the audit log. No hidden abstractions.

Q: Can I use only OpenAI / only Anthropic? A: Yes. You only need one API key. Set LLM_PROVIDER in .env. You can switch at runtime via the UI.

Q: Where can I learn more about how the algorithms work? A: The Technical Manual covers every algorithm, formula, constant, and design rationale in detail — propagation, regime detection, anomaly detection, dynamic weight learning, agent architecture, and more.


Roadmap

  • Three-phase agent reasoning (Plan → Analyze → Validate)
  • Self-critique + confidence decomposition
  • Prediction tracking with auto-resolution and magnitude scoring
  • Agent memory (cross-run context injection)
  • What-if shock simulator with cascade visualization
  • Analyst annotations + regime narrator
  • Dynamic weight learning + edge muting
  • Portfolio overlay + backtesting
  • LLM topology suggestions
  • CI/CD pipeline
  • Morning brief — daily summary of moves, predictions, regime changes
  • RSS news pipeline — 30 curated feeds (no API key), source reliability tiers (T1/T2/T3)
  • Pre-fetch data package — FRED + yfinance + RSS injected before agent starts
  • Evidence provenance — real/mock/inferred per data source per node, evidence history (20 entries)
  • Tool audit trail — full tool call inputs + outputs stored per agent run
  • Batch sentiment updates — atomic multi-node writes with single graph lock
  • Expanded data coverage — 21 yfinance tickers (forex, volatility indices, bond ETFs), 5-day trend context
  • Risk-aware color scheme — red = market-threatening, green = market-friendly (inverted for risk nodes)
  • Agent optimization — ~15 tool calls per run (batch-first, no re-fetching pre-fetched data)
  • 16 bug fixes — LLM error handling/retry/timeout, graph rollback, batch propagation cascade fix, concurrent scheduler guard
  • Scenario engine — strategic foresight agent with 4-phase multi-agent loop, "generate first, map second" design, multi-shock simulate, graph evolution, export
  • Scenario intelligence — economic calendar tool, options positioning, non-linear shock model, scenario comparison, scenario chaining, topic diversity (12 domains)
  • Historical backtesting dashboard with equity curves
  • User-defined custom graphs (bring your own nodes/edges)
  • Alerting (email/Slack when anomalies detected)

Causal Discovery Module

In addition to the hand-crafted expert graph (52 nodes, 117 edges), the project includes a computational causal discovery module that discovers the network structure directly from historical data — no manual edge definitions needed.

How it works

  1. Data pipeline fetches daily prices and macro indicators from 45 sources (yfinance, FRED, CFTC, GDELT, GPR Index) into a TimescaleDB hypertable
  2. Scoring transforms raw data into comparable signals: z-score (deviation from 90-day average), log returns (daily changes), or rolling volatility (20-day choppiness)
  3. Causal algorithms discover which factors statistically predict which others:
    • PCMCI+ — controls for confounders, best for time-series (primary)
    • Granger — pairwise tests, fast but more spurious edges
    • RPCMCI — detects market regime shifts and discovers different causal structures per regime
  4. DoWhy validation automatically tests every discovered edge with conditional independence tests — typically 74-84% of PCMCI+ edges pass
  5. Anchor propagation infers display polarity (green/red) from a small set of anchor nodes (e.g., S&P 500 = positive, VIX = negative) via BFS through causal edges

Expert vs. Discovered

Expert Graph Discovered Graph
Nodes 52 hand-picked 34-42 from data, filtered by statistical significance
Edges 117 hand-drawn 39-287 learned from algorithms
Weights Expert-defined Data-driven
Scores LLM sentiment (~$0.04/node, ~30s) Z-score (free, instant)
Validation None DoWhy statistical tests

The frontend lets you toggle between expert and discovered modes. Both support shock simulation, animation, and node inspection.

For the full technical specification, see backend/app/causal_discovery/README.md.


Contributing

This is a prototype and a learning project. Contributions of all kinds are welcome:

  • Ideas — open an issue to discuss new features or improvements
  • Bug reports — if something breaks, let me know
  • Pull requests — code contributions are appreciated
  • Data sources — suggestions for new data integrations
  • Domain expertise — better causal relationships, node coverage, propagation models

License

MIT

Acknowledgments

About

Causal factor graph + agentic sentiment engine for quant finance. 52-node graph with Claude/GPT analysis, 3D visualization, regime detection, and signal propagation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors