Car Insurance Claim Prediction

This project was originally developed as part of a take-home interview for a Data Scientist role. I was given 2.25 hours to complete the project and then sent it back to the interviewer.

No proprietary information or data from the company has been included. The company name has been redacted and is not affiliated or endorsing this project.

This repository contains:

My original submission for the interview.
An updated version of the project, where I explored improvements and alternative approaches to continue learning and developing my data science skills.

Project Description:

Overview

On your first day at XXXX, you are handed a data set. Here are some quick facts about the data:

The data set has 28 columns and 189,552 rows.
The first column is an ID, which is meant to be unique.
The last column is called ‘loss’, which your colleague explains is the dollar value of an insurance claim that XXXX has paid.
The dataset consists of both numeric and categorical features, which may or may not be useful.

Your ultimate goal is to predict the size of a loss from the rest of the features you were provided. Answer the prompts below as best as you can.

Steps

Set up your workspace.
Load the data set “ds_case_data.csv” and perform a brief exploratory analysis. Comment on any features that may be problematic for modeling.
Describe the distribution of your target variable, “loss.”
Build a model to predict loss and evaluate how well it performs.
How would you describe your model’s performance to a non-technical stakeholder?
Explain your modeling process in writing, including the decisions and assumptions you made. Are you happy with the model? Why or why not? What would be your next steps?

Instructions

You will be graded on the code and insights you generate, and the process you follow, but not the accuracy of your model. If you are running out of time, please explain what you would have done and/or write pseudocode. We expect that the full exercise will take your entire allotted time to complete, so please look ahead now so you can decide which tasks to focus on.

Please DO NOT use an LLM to solve this problem. We expect all work to be original and a result of your own creation.

Environment

The following packages are the only packages that you need to solve the case-study with. You may choose to include other packages at your own discretion (e.g. XGBoost instead of LightGBM, seaborn to supplement matplotlib, etc.). You are not required to use these packages, and you may use any package at your own discretion.

# Required packages
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import *
import lightgbm as lgb
from sklearn.metrics import *

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
ds_case_data.csv		ds_case_data.csv
ds_case_study.ipynb		ds_case_study.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Car Insurance Claim Prediction

Project Description:

Overview

Steps

Instructions

Environment

About

Uh oh!

Releases

Packages

Languages

ktshah04/car-insurance-claim-prediction

Folders and files

Latest commit

History

Repository files navigation

Car Insurance Claim Prediction

Project Description:

Overview

Steps

Instructions

Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages