LydiaMF / DataSciencePortfolio Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Collection of Projects accomplished during the Data Science Programm at Turing College

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DA_Football		DA_Football
DL_Mushrooms		DL_Mushrooms
ML_Home_Credit		ML_Home_Credit
ML_Lending_Club		ML_Lending_Club
.gitignore		.gitignore
README.md		README.md

Repository files navigation

DataSciencePortfolio

Collection of projects accomplished during the Data Science Programm at Turing College.

All projects include an intense data cleaning step, and EDA and statistical inference step.

packages used:
- pandas, numpy (standard processing)
- sqlite3, duckdb (database processing)
- dask, fastparquet (memory saving processing and storing)
- plotly, seaborn, matplotlib (visualization)
- pingouin (statistical analysis)
- scikit-learn, statsmodels, imblearn, lightgbm, xgboost, yellowbrick, pickle (modelling and performance)
- eli5, shap (explainability)
- fastapi, pydantic, uvicorn (deployment)
- torch, lightning
- albumentations (transforms)
skills demonstrated:
- handling and aggregating multiple tables with SQL, duckdb and pandas
- handling tables > 5GB, 30 Mio. rows, 160 features with dask and pandas
- handling multiple tables, a multitude of (cross-table) aggregations, and feature engineering with domain knowledge with dask and pandas --> > 300 features
- Looker dashboards and python plots for data visualization
- statistical inference
- correlation strength and feature importances
- linear and logistic regression in statsmodels
- model selection for classification and linear regression
- recursive feature elimination
- model deployment (Docker, Google Cloud Platform)
- image data analysis and classification (computer vision)

Classical Machine Learning:

Home Credit Default Risk:
- aggregating and combining many auxiliary tables into main table on customer's features and performance
- predict loan status (default)
- deploy model
Lending Club:
- handling and aggregation tables > 5GB, 30 Mio. rows, 150 features
- predict loan acceptance, loan status (default), and interest rate
- deploy models

SQL, Looker, and Logistic and Linear Regression

European Football Leagues Data:
- aggregate/merge multiple tables with SQL
- predict goal difference and win/loss

Deep Learning:

Computer Vision (WORK IN PROGRESS): Classification of Mushrooms with Pytorch Lightning

About

Collection of Projects accomplished during the Data Science Programm at Turing College

Report repository

Languages

Jupyter Notebook 99.8%
Other 0.2%