An end-to-end data analytics project focused on Formula 1 racing, integrating descriptive analytics, predictive modeling, and an interactive dashboard to deliver meaningful insights from motorsport data.
This project aims to analyze historical Formula 1 data and extract insights through:
- Descriptive Analytics to understand past trends and performance
- Predictive Analytics to forecast race-related outcomes
- Dashboard Integration for interactive data exploration
The repository follows clean Git practices, avoids committing large/generated files, and ensures full reproducibility.
.
├── Descriptive/ # Descriptive analytics
│ ├── data/
│ │ ├── processed/
│ │ └── raw/
│ ├── notebooks/
│ │ ├── 01_extraction.ipynb
│ │ ├── 02_cleaning.ipynb
│ │ ├── 03_feature_eng.ipynb
│ │ └── 04_eda_vis.ipynb
│ ├── src/
│ │ ├── __init__.py
│ │ ├── analytics.py
│ │ ├── config.py
│ │ ├── utils.py
│ │ └── youtube_extractor.py
│ └── run_analytics.py
├── Predictive/ # Predictive modeling
│ ├── f1_dashboard.py
│ ├── main_script.py
│ ├── driver_rankings_2024.csv
│ ├── driver_performance_2024.csv
│ ├── 2025_predictions.csv
│ ├── 2025_champion_prediction.txt
│ ├── f1_cache/ # Cached intermediate files
│ └── f1_data_cache/ # Auto-generated datasets
├── 2025_champion_prediction.txt
├── .gitignore
├── README.md
└── requirements.txt
The Descriptive module focuses on understanding historical Formula 1 data through:
- Driver and constructor performance analysis
- Season-wise trends and comparisons
- Race result distributions
- Data visualization for insights
Technologies used:
- Pandas
- Matplotlib / Seaborn
- Scikit
- Jupyter Notebook
The Predictive module applies machine learning techniques to:
- Perform feature engineering on historical race data
- Train predictive models
- Evaluate model performance
- Analyze patterns affecting race outcomes
Approaches include:
- Regression models
- Classification models
- Feature-based prediction pipelines
The dashboard serves as a unified interface for both the Descriptive and Predictive parts of the project.
The dashboard serves as a unified interface that:
- Integrates descriptive and predictive insights
- Enables interactive exploration
- Presents results in a user-friendly format
This allows both technical and non-technical users to explore the data effectively.
git clone https://github.com/ahmedmusharaf31/dba-reddit-project.git
cd dba-youtube-projectpython -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activatepip install -r requirements.txtLarge database and cache files are intentionally excluded from version control.
*.db
f1_cache/
f1_data_cache/.dbfiles are large and auto-generated- They are environment-specific
- Best practice is to regenerate data via scripts
✔ Clean Git history ✔ No GitHub file-size issues ✔ Reproducible workflows
To recreate data or results:
- Run the F1_dashbaord via this command:
(It will take some time to run for the very first time, then it will store the data in the cache)
python -m streamlit run f1_dashboard.py
- Enjoy!
No committed binary or database files are required.
- Advanced machine learning models
- Real-time data integration
- Enhanced dashboard interactivity
- Automated data pipelines
- Ahmed Musharaf
- Saaim Ali Khan
- Muhammad Arsal
This project is developed for academic and educational purposes.
This repository demonstrates a complete data analytics lifecycle, from raw data exploration to predictive insights, while following industry-standard Git practices.
If you find this project useful, feel free to ⭐ the repository!