Data Analysis Project

Overview

This repository contains data analysis notebooks and modularized Python scripts for Exploratory Data Analysis (EDA). The notebooks in the notebooks/ folder provide step-by-step analysis, while the scripts/ folder contains reusable functions for EDA to keep the code modular and maintainable.

Project Structure

The repository is structured as follows:

├── .vscode/
│   └── settings.json
├── .github/
│   └── workflows/
│       └── unittests.yml
├── .gitignore
├── requirements.txt
├── README.md
├── src/
│   └── __init__.py
├── notebooks/
│   ├──EDA.ipynb
│   └── README.md
├── tests/
│   └── __init__.py
└── scripts/
    ├── __init__.py
    └── EDA_functions.py

Key Components

notebooks/: This folder contains the Jupyter notebooks used for data exploration and cleaning. The main file is EDA.ipynb which includes the initial implementation of data import, cleaning, and outlier detection using the IQR method.
scripts/: This folder contains Python scripts that modularize the functions used in the notebooks. The eda_functions.py file contains reusable functions such as handling missing values, detecting outliers, and other EDA tasks.
tests/: This folder can be used for unit tests that ensure the functionality of the code in the scripts/ directory.

Running the Project

Install dependencies: Ensure you have Python 3.x installed and install the required packages using:
```
pip install -r requirements.txt
```
Running the Jupyter Notebook:
- Navigate to the notebooks/ directory and open EDA.ipynb in Jupyter Notebook.
- Run the notebook cells sequentially for data cleaning and EDA.
Using the Modular Functions:
- The scripts/eda_functions.py file contains reusable functions that were initially part of the notebook. You can import these functions in your Python code or notebooks as follows:
```
from EDA_functions import *
```

Notebooks Breakdown

EDA.ipynb:
- Loads data using pandas.
- Performs data cleaning by handling missing values and removing duplicates.
- Detects outliers using the IQR method.
- Uses the modular functions defined in eda_functions.py for better code reuse and clarity.

Scripts Breakdown

eda_functions.py:
- clean_data(df): Cleans the input DataFrame by handling missing values and duplicates.
- detect_outliers(df, column): Detects and removes outliers from a specified column using the IQR method.
- Additional functions for EDA tasks as needed.

License

This project is open-source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Analysis Project

Overview

Project Structure

Key Components

Running the Project

Notebooks Breakdown

Scripts Breakdown

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflow		.github/workflow
notebooks		notebooks
scripts		scripts
src		src
tests		tests
venv		venv
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Atnabon/ACIS-insurance-solutions

Folders and files

Latest commit

History

Repository files navigation

Data Analysis Project

Overview

Project Structure

Key Components

Running the Project

Notebooks Breakdown

Scripts Breakdown

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages