Skip to content

EDA and analysis of medical insurance data to identify key cost drivers using Python.

Notifications You must be signed in to change notification settings

Aastharai821/Insurance-Data-Analysis

Repository files navigation

Insurance-Data-Analysis using Python

This project involves exploratory data analysis (EDA) and linear regression modeling on a medical insurance dataset to uncover key insights and predict insurance charges based on individual characteristics.

Dataset

  • Source: insurance.csv
  • Features:
    • age: Age of the policyholder
    • sex: Gender of the policyholder
    • bmi: Body Mass Index
    • children: Number of dependents
    • smoker: Smoker or non-smoker
    • region: Residential region
    • charges: Insurance cost

Problem Statement

ABC Insurance aims to understand which factors influence medical insurance premiums. The goal is to analyze the dataset and build a model to predict charges using various customer attributes.

Objectives

  • Perform EDA to uncover patterns and relationships
  • Visualize insights using Python libraries (Matplotlib, Seaborn)
  • Build and evaluate a simple linear regression model
  • Provide business-level observations

Tools Used

  • Python
  • Jupyter Notebook
  • Pandas, NumPy
  • Matplotlib, Seaborn

Key Findings

  • Smokers pay significantly higher premiums than non-smokers.
  • There is a positive correlation between age, BMI, and charges.
  • Southeast region showed relatively higher charges.

Model Summary

A linear regression model was used to predict charges. The model was evaluated using Root Mean Square Error (RMSE) and visualized with predicted vs actual plots.

About

EDA and analysis of medical insurance data to identify key cost drivers using Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published