Table of Contents generated with DocToc
See CHANGELOG.md for details.
The goal of sportsdataverse-py is to provide the community with a python package for working with sports data as a companion to the cfbfastR, hoopR, and wehoop R packages. Beyond data aggregation and tidying ease, one of the multitude of services that sportsdataverse-py provides is for benchmarking open-source expected points and win probability metrics for American Football.
| League | Module | Surfaces covered |
|---|---|---|
| NBA | sportsdataverse.nba |
ESPN (Site v2 + Web v3 + Core v2) + Fox Sports (Bifrost) |
| WNBA | sportsdataverse.wnba |
ESPN |
| MBB (NCAA M) | sportsdataverse.mbb |
ESPN + NCAA-only (rankings, recruits) + Fox Sports (Bifrost) |
| WBB (NCAA W) | sportsdataverse.wbb |
ESPN + NCAA-only |
| CFB | sportsdataverse.cfb |
ESPN + NCAA + football-only (QBR) + Fox Sports (Bifrost) + Yahoo Sports |
| NFL | sportsdataverse.nfl |
ESPN + NFL.com API (api.nfl.com "Shield") + nflverse loaders (nflreadpy parity) + football-only (QBR) |
| MLB | sportsdataverse.mlb |
ESPN + MLB Stats API (statsapi.mlb.com) + Baseball Savant / Statcast (43-endpoint mlb_statcast_* surface) + Fox Sports (Bifrost) |
| NHL | sportsdataverse.nhl |
api-web.nhle.com/v1/ (game-feed) + NHL EDGE (player tracking) + Stats REST + Records site + Fox Sports (Bifrost) |
Each league exports 150–340 public functions (ESPN wrappers + that league's
native-API wrappers + dataset loaders + parsers); ~1,600 in total. Fox Sports
adds fox_<league>_* Bifrost wrappers (pbp / boxscore / odds / roster / stats /
standings / leaders) for nba, mbb, cfb, mlb, nhl; Yahoo Sports adds
yahoo_cfb_* season-stats / scoreboard wrappers for college football.
Parser-backed wrappers return a tidy polars DataFrame by default
(0.0.54+). Pass return_parsed=False for the raw Dict, or
return_as_pandas=True for pandas. Wrappers without a registered
parser return the raw Dict.
from sportsdataverse.nba import espn_nba_team_roster
df = espn_nba_team_roster(team_id=13) # → polars (default)
raw = espn_nba_team_roster(team_id=13, return_parsed=False) # → Dict
pdf = espn_nba_team_roster(team_id=13,
return_as_pandas=True) # → pandasFor the NHL and MLB sibling-API wrappers, compose the wrapper with its parser:
from sportsdataverse.nhl import nhl_web_pbp, parse_nhl_web_pbp
df = parse_nhl_web_pbp(nhl_web_pbp(2023030417)) # 331-row polars frameSee py.sportsdataverse.org/docs/architecture/espn-cross-league and py.sportsdataverse.org/docs/parsers/index for the full architecture + parser registry.
The package metadata lives entirely in pyproject.toml
(PEP 621 [project] table). There is no setup.py source-of-truth.
pip install sportsdataverseWith optional extras (defined in [project.optional-dependencies] in
pyproject.toml):
pip install "sportsdataverse[all]" # everything below
pip install "sportsdataverse[models]" # extra deps for the EPA / WP model code
pip install "sportsdataverse[tests]" # adds pytest, mypy, ruff, etc.uv is the fast, drop-in package manager we use day to day.
# Add to a uv-managed project:
uv add sportsdataverse
# With extras:
uv add "sportsdataverse[all]"
# Or install the latest dev snapshot from GitHub:
uv add "sportsdataverse @ git+https://github.com/sportsdataverse/sportsdataverse-py"For contributing or running the test suite:
git clone https://github.com/sportsdataverse/sportsdataverse-py.git
cd sportsdataverse-py
# uv (recommended) — fully resolved editable install with every extra:
uv pip install -e ".[all]"
# Plain pip works too if uv isn't available:
pip install -e ".[all]"Note: once we add a PEP 735
[dependency-groups]block (currently the repo only ships PEP 621[project.optional-dependencies]),uv sync --all-extras --all-groupswill become the one-shot dev incantation. Until then,uv pip install -e ".[all]"is the equivalent path.
Run the test suite:
uv run pytest # offline tests only
SDV_PY_LIVE_TESTS=1 uv run pytest # include live API tests (slower; hits ESPN / nflverse)For deeper dev-environment detail (lint, mypy, dep-bumping workflow), see CONTRIBUTING.md.
- Python target: 3.9–3.14.
- DataFrame engine: polars 1.x. Most loaders accept
return_as_pandas=Trueif you prefer pandas. - NFL caching: loaders cache to memory by default. Set
SDV_PY_NFL_CACHE=filesystemfor cross-session reuse, orSDV_PY_NFL_CACHE=offto disable. Seesportsdataverse.nfl.config.update_config()for runtime control.
Every public function ships a runnable Example: block in its docstring
showing a quick-start call, common parameter combinations, and a one-line
pipeline next-step. Regenerate the API reference locally with
uv run python tools/codegen/generate.py --docs (then cd docs && yarn build
to preview the Docusaurus site) or browse the live docs at
py.sportsdataverse.org.
For longer-form walkthroughs, see the intro/intermediate Jupyter notebooks
under examples/notebooks/:
| Notebook | Covers |
|---|---|
01_quickstart.ipynb |
Cross-sport intro — package layout, polars vs pandas, the download() retry layer |
02_cfb_intro.ipynb |
College football PBP, schedule, teams, espn_cfb_play_participants |
03_nfl_intro.ipynb |
NFL — nflreadpy parity surface, caching layer, current-season helpers |
04_nba_intro.ipynb |
NBA — PBP, schedule, teams, game rosters, shot distribution |
05_wbb_wnba_intro.ipynb |
Women's basketball — NCAA + WNBA parallels, multi-table stats |
06_mbb_intro.ipynb |
Men's college basketball — PBP, schedule, conference standings |
07_nhl_intro.ipynb |
NHL — PBP, schedule, teams, shot-event filter |
sportsdataverse-py is one corner of the broader SportsDataverse
ecosystem. The R sister packages cover the same data sources with deeper
sport-specific coverage:
- wehoop — women's basketball (WNBA + NCAA)
- hoopR — men's basketball (NBA + NCAA)
- cfbfastR — college football
- baseballr — baseball (MLB + MiLB + NCAA)
- fastRhockey — hockey (NHL + WHL)
The NFL submodule is a near drop-in replacement for nflreadpy; the broader nflverse ecosystem is the upstream data source for many of those loaders.
To cite the sportsdataverse-py Python package in publications, use:
BibTex Citation
@misc{gilani_sdvpy_2021,
author = {Gilani, Saiem},
title = {sportsdataverse-py: The SportsDataverse's Python Package for Sports Data.},
url = {https://py.sportsdataverse.org},
season = {2021}
}