A framework for analyzing prediction market data, including the largest publicly available dataset of Polymarket and Kalshi market and trade data. Provides tools for data collection, storage, and running analysis scripts that generate figures and statistics.
This project enables research and analysis of prediction markets by providing:
- Pre-collected datasets from Polymarket and Kalshi
- Data collection indexers for gathering new data
- Analysis framework for generating figures and statistics
Currently supported features:
- Market metadata collection (Kalshi & Polymarket)
- Trade history collection via API and blockchain
- Parquet-based storage with automatic progress saving
- Extensible analysis script framework
Requires Python 3.9+. Install dependencies with uv:
uv syncDownload and extract the pre-collected dataset (36GiB compressed):
make setupThis downloads data.tar.zst from Cloudflare R2 Storage and extracts it to data/.
Collect market and trade data from prediction market APIs:
make indexThis opens an interactive menu to select which indexer to run. Data is saved to data/kalshi/ and data/polymarket/ directories. Progress is saved automatically, so you can interrupt and resume collection.
make analyzeThis opens an interactive menu to select which analysis to run. You can run all analyses or select a specific one. Output files (PNG, PDF, CSV, JSON) are saved to output/.
To compress the data directory for storage/distribution:
make packageThis creates a zstd-compressed tar archive (data.tar.zst) and removes the data/ directory.
├── src/
│ ├── analysis/ # Analysis scripts
│ │ ├── kalshi/ # Kalshi-specific analyses
│ │ └── polymarket/ # Polymarket-specific analyses
│ ├── indexers/ # Data collection indexers
│ │ ├── kalshi/ # Kalshi API client and indexers
│ │ └── polymarket/ # Polymarket API/blockchain indexers
│ └── common/ # Shared utilities and interfaces
├── data/ # Data directory (extracted from data.tar.zst)
│ ├── kalshi/
│ │ ├── markets/
│ │ └── trades/
│ └── polymarket/
│ ├── blocks/
│ ├── markets/
│ └── trades/
├── docs/ # Documentation
└── output/ # Analysis outputs (figures, CSVs)
- Data Schemas - Parquet file schemas for markets and trades
- Writing Analyses - Guide for writing custom analysis scripts
If you'd like to contribute to this project, please open a pull-request with your changes, as well as detailed information on what is changed, added, or improved.
For more information, see the contributing guide.
If you've found an issue or have a question, please open an issue here.
- Becker, J. (2026). The Microstructure of Wealth Transfer in Prediction Markets. Jbecker. https://jbecker.dev/research/prediction-market-microstructure
If you have used or plan to use this dataset in your research, please reach out via email or Twitter -- i'd love to hear about what you're using the data for! Additionally, feel free to open a PR and update this section with a link to your paper.