Exploratory data analysis of NFL play-by-play data using nfl-data-py
The first goal is to develop a playcalling engine based on real NFL playcalling data. Initially the playcalling engine might be based purely on game context, such as
- Score (point differential)
- Time
- Quarter
- Yard line
- Down & distance
- Timeouts remaining
In the future, we might extend this playcalling engine to accept additional team context, such as
- Offensive playcalling style
- Passing game overall
- Pass blocking
- QB overall
- WR overall
- Rushing game overall
- Run blocking
- RB overall
The next goal is to develop a model (or models) which can represent the outcomes of individual plays. The input to this model should be
- The play call from the playcalling engine
- Some measurement(s) of offensive skill
- Some measurement(s) of defensive skill
As a starting point, we will consider the following measurements of offensive skill
blocking: Likelihood of a TFL, sack, or QB hit againstrushing: Yards per carry forpassing: Passer rating forreceiving: Incomplete passes and yards after catch forscrambling: Likelihood of a QB scramble and QB yards per carryturnovers: Likelihood of causing a turnoverpenalties: Likelihood of committing a penalty
As a starting point, we will consider the following measurements of defensive skill (which mirror the offensive skill measurements closely)
blitzing: Likelihood of a TFL, sack, or QB hit forrush_defense: Yards per carry againstpass_defense: Passer rating againstcoverage: Incomplete passes and yards after catch againstturnovers: Likelihood of recovering a turnoverpenalties: Likelihood of committing a penalty