Contributors: Andrew Argeros and Max Bolger 
This repo contains all of the code, data, and output from our entry in the 2021 MinneMUDAC Data Science Competition sponsored by MinneAnalytics.
This year's competition focused on predicting the outcome of the 2021 NCAA Men's Basketball Tournament, or March Madness.
The majority of the data used stems from the 2021 Kaggle March Mania Competition. The data are assembled into 20 .csv files, stored in the /Data/Kaggle Data folder.
These files can be downloaded using the Kaggle API:
kaggle competitions download -c ncaam-march-mania-2021 -p {PATH TO YOUR DIRECTORY}
Yearly, since 1977, McDonalds has hosted a sort of all-star game for the top high school players in the country. These players eventually go on to play Division 1 basketball, and are some of the top players in their matriculating class. By scraping Wikipedia's pages on the all star game rosters, we are able to account for the relative "Star Power" of a given team in a given season. For example, the 2018-19 Duke Blue Devils had a team of four MCD All Americans from the year previous. This is often regarded as one of the top recruiting classes in college basketball. The code to scrape this data is located in all_americans.py and the resulting data is stored in /Back End Data/all_americans0320.csv.
Additionally, we pulled in location, ranking, and player data for each team listed on SportsReference. These files are additionally stored in /Data/Back End Data
To predict the outcomes of games, we used two models. The first was a random forests based model, trained to predict team_1 or team_2. The ordering of the teams was changed this way for its data quality. The team listed in the data as team_1, was determined by the odd/evenness of the w_team_id.
To predict, we used the underlying predict.all = TRUE, argument to bootstrap individual tree predictions, so as to create a more stochastic process, better simulating the randomness of the tournament. We additionally calculated the "probability", based on the voting breakdown of the model's trees.
Our bracket finished 19th in the overall competition, and in the 89th percentile of ESPN's Tournament challenge, giving Andrew a 2nd place finish in his family pool. One thing to note is the MinneMUDAC scoring system, which gave an incentive to picking upsets.
- Sweet Sixteen: 6/16
- Elight Eight: 4/8
- Final Four: 3/4

