Statistical analysis of favorite win rates across sports prediction markets using hybrid API architecture.
This project analyzes sports prediction market efficiency on Polymarket. The primary research question: What is the empirical win rate of favorites across different professional sports?
The analysis processes 10,115 closed betting markets across 10,223 total events in seven sports (ATP, WTA, NBA, NFL, MLB, CFB, CBB), achieving 99% data completeness through a hybrid API integration approach.
| Sport | Favorite Win Rate | Sample Size | Events Analyzed |
|---|---|---|---|
| College Basketball | 72.7% | 1,250/1,720 | 1,720 |
| College Football | 72.9% | 805/1,105 | 1,105 |
| ATP Tennis | 69.5% | 1,234/1,776 | 1,776 |
| NBA Basketball | 67.9% | 1,350/1,988 | 1,988 |
| WTA Tennis | 66.7% | 12/18 | 18 |
| NFL Football | 66.6% | 414/622 | 622 |
| MLB Baseball | 56.5% | 1,385/2,450 | 2,450 |
The system implements a multi-stage data pipeline integrating two Polymarket APIs:
┌──────────────────┐
│ Gamma API │ Sport-based event filtering via tag IDs
│ (Event Catalog) │ Fetches: event metadata, participants, market structure
└────────┬─────────┘
│
▼
┌──────────────────┐
│ CLOB API │ Token-based pricing enrichment
│ (Order Book) │ Fetches: closing prices, settlement data, volume
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Analysis Engine │ Win rate calculation and aggregation
│ (Pandas) │ Logic: identify favorite → validate winner → compute rates
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Excel Output │ Actionable insights generation
│ (xlsxwriter) │ Format: 7-tab workbook with analysis
└──────────────────┘
- Multi-sport support: ATP, WTA, NBA, NFL, MLB, CFB, CBB
- Hybrid API architecture: Combines Gamma API (events) + CLOB API (pricing)
- 99.7% data completeness: Token ID matching resolves missing price data
- Async pipeline: Concurrent fetching with aiohttp and rate limiting
- Error tracking: Comprehensive retry logic with exponential backoff
- Actionable Excel output: 7-tab workbook with betting insights and ROI analysis
- Python 3.11 or higher
- pip package manager
# Clone repository
git clone https://github.com/yourusername/polymarket-sports-predictability.git
cd polymarket-sports-predictability
# Install dependencies
pip install -r requirements.txtpython src/fetch_sports.pyOutput: data/fetch_sports.csv (~5 seconds)
python src/fetch_events.pyOutput: data/fetch_events.csv (60-90 minutes for 10,223 events)
python src/generate_insights.pyOutput: outputs/favourite_win_rates.xlsx (7 focused tabs with actionable betting insights)
The pipeline uses two Polymarket APIs:
- Gamma API (
https://gamma-api.polymarket.com/events) - Event discovery via sport tags - CLOB API (
https://clob.polymarket.com) - Reliable pricing and settlement data
The hybrid approach resolves Gamma API's 89% missing price data by matching events via condition_id to CLOB market data.
data/
├── fetch_events.csv # 10,223 events with pricing and settlement
└── fetch_sports.csv # Sports metadata and tag mappings
outputs/
└── favourite_win_rates.xlsx # Excel workbook with 7 tabs:
├── Index # Overview + key takeaways
├── Quick Reference # Top actionable strategies
├── Sport Guide # Which sports to bet
├── Underdog Opportunities # ROI by sport/threshold
├── Reliable Favorites # Best teams when favored
├── Market Efficiency # Calibration analysis
└── Raw Data # Summary statistics
polymarket-sports-predictability/
├── README.md
├── LICENSE
├── requirements.txt
├── src/
│ ├── fetch_sports.py # Sports metadata fetcher
│ ├── fetch_events.py # Event data pipeline
│ └── generate_insights.py # Betting insights analysis
├── tests/
│ ├── test_fetch_sports.py
│ ├── test_generate_chart.py
│ └── test_integration.py
├── data/ # Generated datasets
└── outputs/ # Generated Excel workbook
This project is for educational and research purposes only. The analysis is based on historical market data and should not be construed as investment advice.
- Past performance does not guarantee future results
- Users should comply with all applicable laws and Polymarket's terms of service
This project is licensed under the MIT License - see the LICENSE file for details.