The purpose of this project was, to begin with, to show a number of shallow ML projects. These would include the EDA, modeling and evaluation phases. You will see these in workbooks 1 and 2. However, after doing multiple of these small shallow ML projects, I realized I was writing the same code over and over. Therefore, this project became about extracing the shared code into a framework that would make these projects quick and easily repeatable.
You can find the framework code in the the src folder and split into a rough breakdown of 3 phases of the machine learning lifecycle: EDA, feature engineering and modeling, and evaluation. The Template Workbook demonstrates usage of this framework and can be copied to instantiate a new project.
Use this notebook to see the usage of the framework and copy it to instantiate a new project.
Explore the data for identified fraudsters and other users. What are your preliminary observations? Utilizing your findings from and some creativity, create some features. Explain your reasoning behind the features. Create an ML model which identifies fraudsters. Assess the quality of your model and explain.
Please see workbook for data description, process and results.
Explore the data for identified fraudsters and other users. What are your preliminary observations? Utilizing your findings from and some creativity, create some features. Explain your reasoning behind the features. Create an ML model which identifies fraudsters. Assess the quality of your model and explain.
Please see workbook for data description, process and results.
You are provided with a sample dataset of a telecom company’s customers and it's expected to done the following tasks:
- Perform exploratory analysis and extract insights from the dataset.
- Split the dataset into train/test sets and explain your reasoning.
- Build a predictive model to predict which customers are going to churn and discuss the reason why you choose a particular algorithm.
- Establish metrics to evaluate model performance.
- Discuss the potential issues with deploying the model into production.
Please see workbook for data description, process and results.