🔥 Market Firehose System

A high-throughput, real-time financial article processing system that leverages LLMs to parse unstructured text into structured data at scale. Built for quantitative investment funds to gain competitive edge in market intelligence.

🎯 Overview

Market Firehose processes 100+ articles per minute from multiple sources, using AI to extract:

Publisher, author, and publication date
Article title and body content
Related market sectors and industries
Stock tickers and company mentions
Sentiment analysis

The system supports real-time pub/sub subscriptions, allowing downstream consumers to receive structured articles instantly.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                        MARKET FIREHOSE SYSTEM                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│   │  RSS Feeds  │  │  API Feeds  │  │  Webhooks   │  │  Scrapers   │       │
│   └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘       │
│          └────────────────┴────────────────┴────────────────┘              │
│                                     │                                       │
│                          ┌──────────▼──────────┐                           │
│                          │   INGESTION LAYER   │                           │
│                          │  (Feed Adapters)    │                           │
│                          └──────────┬──────────┘                           │
│                                     │                                       │
│                          ┌──────────▼──────────┐                           │
│                          │    MESSAGE QUEUE    │                           │
│                          │      (Redis)        │                           │
│                          └──────────┬──────────┘                           │
│                                     │                                       │
│           ┌─────────────────────────┼─────────────────────────┐            │
│           │                         │                         │            │
│  ┌────────▼────────┐   ┌────────────▼────────────┐   ┌────────▼────────┐  │
│  │   LLM WORKER    │   │      LLM WORKER         │   │   LLM WORKER    │  │
│  │  (GPT-4o-mini)  │   │     (GPT-4o-mini)       │   │  (GPT-4o-mini)  │  │
│  └────────┬────────┘   └────────────┬────────────┘   └────────┬────────┘  │
│           └─────────────────────────┼─────────────────────────┘            │
│                                     │                                       │
│                          ┌──────────▼──────────┐                           │
│                          │     POSTGRESQL      │                           │
│                          │  (Structured Data)  │                           │
│                          └──────────┬──────────┘                           │
│                                     │                                       │
│                          ┌──────────▼──────────┐                           │
│                          │   PUB/SUB ENGINE    │                           │
│                          │   (Redis Streams)   │                           │
│                          └──────────┬──────────┘                           │
│                                     │                                       │
│        ┌────────────────────────────┼────────────────────────────┐         │
│        │                            │                            │         │
│  ┌─────▼─────┐            ┌─────────▼─────────┐         ┌────────▼───────┐ │
│  │  REST API │            │    WebSocket      │         │    Webhooks    │ │
│  │ (FastAPI) │            │   Subscribers     │         │   Callbacks    │ │
│  └───────────┘            └───────────────────┘         └────────────────┘ │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

✨ Features

Ingestion Layer

Multi-source support: RSS feeds, REST APIs, webhooks, web scrapers
Extensible adapters: Easy to add new data sources
Deduplication: Content fingerprinting to prevent duplicates
Rate limiting: Configurable polling intervals per source

Processing Engine

LLM-powered parsing: GPT-4o-mini for intelligent text extraction
Parallel processing: 10+ concurrent workers
Batch processing: Efficient handling of article batches
Retry logic: Automatic retry with exponential backoff

Extracted Data

Field	Description
`publisher`	News source/publication name
`author`	Article author(s)
`published_date`	Publication timestamp
`title`	Article headline
`body`	Full article content
`summary`	AI-generated summary
`sectors`	Related market sectors (GICS-based)
`mentioned_tickers`	Stock symbols mentioned
`sentiment`	Positive/negative/neutral
`sentiment_score`	-1.0 to 1.0 score

Subscription System

WebSocket: Real-time streaming for low-latency clients
Webhooks: Push notifications to external systems
Polling: REST API with cursor-based pagination
Filtering: Subscribe by sector, source, sentiment, or ticker

🚀 Quick Start

Prerequisites

Python 3.11+
PostgreSQL 16+
Redis 7+
OpenAI API key

Installation

Clone the repository

git clone https://github.com/your-org/Market-Firehose-System.git
cd Market-Firehose-System

Set up environment

# Copy environment template
cp env.example .env

# Edit .env with your configuration
# IMPORTANT: Set your OPENAI_API_KEY

Install dependencies

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Start services with Docker

docker-compose up -d postgres redis

Initialize the database

# The schema will be auto-applied when PostgreSQL starts
# Or run manually:
docker exec -i firehose_postgres psql -U postgres -d market_firehose < src/db/schema.sql
docker exec -i firehose_postgres psql -U postgres -d market_firehose < scripts/seed_sectors.sql

Run the API server

uvicorn src.main:app --reload --host 0.0.0.0 --port 8000

Start the worker (in a separate terminal)

python -m src.queue.worker

Using Docker Compose (Full Stack)

docker-compose up -d

This starts:

PostgreSQL database (port 5432)
Redis (port 6379)
API server (port 8000)
Background worker

📡 API Reference

Ingest Articles

POST /api/v1/articles

{
  "articles": [
    {
      "raw_content": "Apple Inc. reported record quarterly earnings...",
      "external_url": "https://example.com/article/123",
      "source_id": "optional-uuid"
    }
  ],
  "priority": 0
}

Response:

{
  "ingested": 1,
  "duplicates": 0,
  "article_ids": ["550e8400-e29b-41d4-a716-446655440000"]
}

Query Articles

GET /api/v1/articles

Query Parameters:

sector_id: Filter by sector
source_id: Filter by source
sentiment: Filter by sentiment (positive/negative/neutral)
ticker: Filter by mentioned ticker
from_date: Start date (ISO format)
to_date: End date (ISO format)
limit: Page size (default: 50)
offset: Pagination offset

WebSocket Subscription

WS /api/v1/ws/subscribe

const ws = new WebSocket("ws://localhost:8000/api/v1/ws/subscribe");
ws.send(
  JSON.stringify({
    sectors: ["technology", "healthcare"],
    min_confidence: 0.7,
  })
);
ws.onmessage = (event) => {
  const article = JSON.parse(event.data);
  console.log("New article:", article.title);
};

Health Check

GET /health

{
  "status": "healthy",
  "database": "connected",
  "redis": "connected",
  "queue_depth": 42,
  "articles_per_minute": 87.5
}

📊 Database Schema

Core Tables

Table	Description
`articles`	Processed articles with extracted fields
`sources`	Article feed configurations
`sectors`	Market sector taxonomy (GICS-based)
`article_sectors`	Article-sector relationships
`subscribers`	Pub/sub subscriber configurations
`delivery_log`	Article delivery audit trail

Key Indexes

idx_articles_status: Fast pending article lookup
idx_articles_published: Chronological queries
idx_articles_tickers: Ticker-based filtering (GIN)
idx_articles_title_search: Full-text search

🔧 Configuration

Environment Variables

Variable	Description	Default
`DATABASE_URL`	PostgreSQL connection string	`postgresql+asyncpg://...`
`REDIS_URL`	Redis connection string	`redis://localhost:6379/0`
`OPENAI_API_KEY`	OpenAI API key	required
`OPENAI_MODEL`	LLM model to use	`gpt-4o-mini`
`WORKER_CONCURRENCY`	Parallel workers	`10`
`BATCH_SIZE`	Articles per batch	`20`
`LOG_LEVEL`	Logging level	`INFO`

📈 Performance

Benchmarks

Ingestion: 200+ articles/minute
Processing: 100+ articles/minute (with 10 workers)
Latency: <5s average per article
API Response: <50ms for queries

Scaling

Horizontal scaling via additional workers
Connection pooling for database efficiency
Redis clustering for high availability

🧪 Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/test_llm_parser.py -v

📁 Project Structure

Market-Firehose-System/
├── src/
│   ├── api/              # FastAPI routes and WebSocket handlers
│   ├── db/               # Database layer and repositories
│   ├── models/           # Pydantic models
│   ├── services/         # Business logic (LLM, processing)
│   ├── ingestion/        # Feed adapters
│   ├── queue/            # Redis queue and workers
│   └── utils/            # Logging, metrics
├── tests/                # Test suite
├── scripts/              # Database scripts
├── docker-compose.yml    # Container orchestration
└── requirements.txt      # Python dependencies

🛣️ Roadmap

Phase 1: Foundation (Database, Models)
Phase 2: LLM Processing Engine
Phase 3: Feed Ingestion Adapters
Phase 4: Message Queue & Workers
Phase 5: REST API
Phase 6: Pub/Sub Subscriptions
Phase 7: Monitoring & Production

📚 References

Quantitative Analysis in Finance - Investopedia
SEC Markets Data - Official SEC data
FastAPI Documentation - API framework
LLM Text Classification - Medium

📄 License

MIT License - see LICENSE for details.

Built with ❤️ for Company X Quant Team

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
scripts		scripts
src		src
tests		tests
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
env.example		env.example
nginx.conf		nginx.conf
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🔥 Market Firehose System

🎯 Overview

🏗️ Architecture

✨ Features

Ingestion Layer

Processing Engine

Extracted Data

Subscription System

🚀 Quick Start

Prerequisites

Installation

Using Docker Compose (Full Stack)

📡 API Reference

Ingest Articles

Query Articles

WebSocket Subscription

Health Check

📊 Database Schema

Core Tables

Key Indexes

🔧 Configuration

Environment Variables

📈 Performance

Benchmarks

Scaling

🧪 Testing

📁 Project Structure

🛣️ Roadmap

📚 References

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages