Seeding Statistics of Elimination Tournaments

This repository analyzes elimination tournaments in Wikipedia, and is the source for
Prescott, Timothy. 2025. "Seeding Statistics of Elimination Tournaments." Scatterplot 2 (1). https://doi:10.1080/29932955.2025.2523666.
If you're not as programmatically inclined, you can use a GUI with JavaScript

I examined 60k games between ranked teams in 5k tournaments to see if there is a bias for or against a particular team, state, or conference. (We do not find any such bias.)

I've tried to be consistent on how to handle renaming and merging, but may not have entirely succeeded. It's also possible that a .txt file was downloaded under slightly different conditions in tourneys.json.

From start to finish, python analyze.py takes about 60 (wild guess) minutes. But it saves its results along the way, so if you get bored you can interrupt and restart the process. Running from start to finish once everything is downloaded takes just a few minutes (mostly depending on how many non-existent files it tries to download from Wikipedia).

The documentation of the Python and Javascript files can be generated by pdoc -o docs *.py and jsdoc -c jsdoc.conf.json, respectively.

There are a large number of files created along the way that are .gitignored:

{group}/{tournament}/{year}.txt: the content of that year's entry in Wikipedia
{group}/{tournament}/None.txt: same as above, but all the tournaments are on one page
[group/[tournament/]]state.csv Each (normalized) university and its state
[group/[tournament/]]winloss.csv The matrix of counts of seed defeating seed
{group}/{tournament}/winlossplot.tex
{group}/{tournament}/winlossprobs.tex

Reseeding files:

[group/[tournament/]]reseed.csv How much each university should be reseeded
[group/]reseed_filtered.csv Same as reseed, but trimmed down so that points not plotted by TeX don't appear (it had trouble with the file size)
[group/]reseed_approx.csv Same as reseed, but a linear approximation of its components
[group/[tournament/]]state_reseed.csv Same as reseed, but grouped by state
[group/[tournament/]]tz_reseed.csv Same as reseed, but grouped by timezone

To cite the paper, you can use

@article{Prescott31122025,
    author = {Timothy Prescott},
    title = {Seeding Statistics of Elimination Tournaments},
    journal = {Scatterplot},
    volume = {2},
    number = {1},
    pages = {2523666},
    year = {2025},
    publisher = {Taylor \& Francis},
    doi = {10.1080/29932955.2025.2523666},
    URL = {https://doi.org/10.1080/29932955.2025.2523666},
    eprint = {https://doi.org/10.1080/29932955.2025.2523666},
    abstract = {We develop and provide Python code and a website
        to statistically analyze seedings in elimination tournaments.
        We are able to apply this code to fifty-eight thousand games
        to estimate the probability of an upset solely as a logistic
        function of the difference in seeding.
        We are also able to examine how well or poorly a team
        performs compared to its seeding.
        We conclude that the only team that is consistently
        underrated is \textbackslash your\_favorite\_team,
        while the only team that is consistently overrated is
        \textbackslash your\_hated\_rival.}
}

Non BibTeX citation styles are also available at the Scatterplot website.

You can also use the link at the top right of the page to "Cite this repository".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeding Statistics of Elimination Tournaments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Seeding Statistics of Elimination Tournaments